All posts tagged 'INQ'

.NET Musings

Using LINQ To Select Records Based on Child Collections

Technorati Tags: ,

Yesterday, after my session on LINQ at MADExpo, a gentleman asked me how I would solve a certain problem that he is facing at work with LINQ.  Here is my solution (turns out I was a little brain dead after giving three sessions at the conference, so while he was sitting with me, I just couldn’t see it).

The Problem

The problem consists of a list of salespeople that have sales.  Here are the classes involved:

public class SalesPerson
    public string Name { get; set; }
    public IList<Sales> Sales { get; set; }
public class Sales
    public string Day { get; set; }
    public decimal Amount { get; set; }

The gentleman wanted to know (by day) the top sales people.  If the relationship was bi-directional (there was a link to SalesPerson from the Sales class), this would have been an extremely simple solution.  But since the relationship is only one way, there was a little more thinking involved.

Creating the Test

To setup the test, we had to create some data and some asserts to validate that the LINQ statement pulled back the correct SalesPeople.  Here is the test:

public void ShouldGetAverageDailySales()
    SalesPerson bob = new SalesPerson() {
        Name = "Bob",
        Sales = new List<Sales>() {
            CreateSales("Mon", 50),
            CreateSales("Tue", 50),
            CreateSales("Wed", 500),
            CreateSales("Thu", 50),
            CreateSales("Fri", 50) } };
    SalesPerson sue = new SalesPerson() {
        Name = "Sue",
        Sales = new List<Sales>() {
            CreateSales("Mon", 100),
            CreateSales("Tue", 400),
            CreateSales("Wed", 100),
            CreateSales("Thu", 300),
            CreateSales("Fri", 100) } };
    SalesPerson tom = new SalesPerson() {
        Name = "Tom",
        Sales = new List<Sales>() {
            CreateSales("Mon", 195),
            CreateSales("Tue", 200),
            CreateSales("Wed", 200),
            CreateSales("Thu", 200),
            CreateSales("Fri", 2000) } };
    IList<SalesPerson> people = new List<SalesPerson>() {
        tom };
    //Create LINQ Statement Here
    Assert.AreEqual("Tom", sales[0].Name);
    Assert.AreEqual(195, sales[0].Amount);
    Assert.AreEqual("Mon", sales[0].Day);
    Assert.AreEqual("Sue", sales[1].Name);
    Assert.AreEqual(400, sales[1].Amount);
    Assert.AreEqual("Tue", sales[1].Day);
    Assert.AreEqual("Bob", sales[2].Name);
    Assert.AreEqual(500, sales[2].Amount);
    Assert.AreEqual("Wed", sales[2].Day);
    Assert.AreEqual("Sue", sales[3].Name);
    Assert.AreEqual(300, sales[3].Amount);
    Assert.AreEqual("Thu", sales[3].Day);
    Assert.AreEqual("Tom", sales[4].Name);
    Assert.AreEqual(2000, sales[4].Amount);
    Assert.AreEqual("Fri", sales[4].Day);

We have one helper method for the test, shown here:

private Sales CreateSales(string day, decimal amount)
    return new Sales() {
        Day = day,
        Amount = amount };

Breaking It Down

The approach that I took (and if readers have a more elegant solution, please post a comment) was to first determine the maximum sales by day (note my solution does not take into account ties).

The first step was to get the Sales classes into their own collection.  To do this we use “SelectMany”.  The following LINQ statement will return an IEnumerable<Sales>. If we just used “Select”, we would get IEnumerable<IEnumerable<Sales>>, which doesn’t help in the least.  Remember when you are pulling a child collection out of your source collection, you are typically going to use SelectMany (but your problem will really dictate what to use).

var p1 = people.SelectMany(p => p.Sales);

The next step is to group all of the sales data by day.  We do this with a groupby, selecting the Key value and the data to place in the group.  The following code illustrates this.

var p2 = p1.GroupBy(p => p.Day, p => p.Amount);

This creates a dictionary where the Day (from the Sales class) is the key, and all of the matching sales amounts become the data associated with the key. To access the data, we can inspect the created entity in the Immediate window.  Each item in the created IEnumerable has a Key property, and the Value of the data is an IList<decimal>, that we can enumerate over with a simple ToList() call.

? p2.ToList()[0].Key
? p2.ToList()[0].ToList()
Count = 3
    [0]: 50
    [1]: 100
    [2]: 195

This still isn’t what we want, since we need the Max sales per day.  Fortunately, from here, this is a simple step by using a Projection.  Projections are where we create a new object from an existing object(s). In the following code snippet, we are creating a new Sales object for each key.  The unique key (Day from the original collection) is assigned the Day property in the new object, and the Amount is the maximum value of that Day’s records in the dictionary.

var p3 = p2.Select(r => 
      new Sales {Day = r.Key,Amount = r.Max() });

Putting it all together into one line, it looks like this:

var sales = people.SelectMany(p => p.Sales)
   .GroupBy(p => p.Day, p => p.Amount)
   .Select(r => new Sales {
        Day = r.Key,
        Amount = r.Max() });

This gives as a new IList of Sales objects, one for each Day in the source data, with the Max sales amount for that day. Examining the resulting IEnumerable in the Immediate window gives us this glimpse into the data:

? p3.ToList()
Count = 5
    [0]: {LinqSamples.A_BaseOperators.Sales}
    [1]: {LinqSamples.A_BaseOperators.Sales}
    [2]: {LinqSamples.A_BaseOperators.Sales}
    [3]: {LinqSamples.A_BaseOperators.Sales}
    [4]: {LinqSamples.A_BaseOperators.Sales}
? p3.ToList()[0]
    Amount: 195
    Day: "Mon"

Now that we have the top sales by day from the original collection, we need to go back and find the SalesPerson that had the top sales.

At first, it is tempting to write the following code into our LINQ expression, where “z” represents the top sales object from the previous LINQ statement above:

people.Where(p => p.Sales.Contains(z))

There are a couple of issues with this.  The first is that we create new Sales objects, and Contains is a reference operation. So it is trying to compare based on the actual references in memory, not the values of the Sales object.  Secondly, this statement gives us another list, and we need to get a single Value.  To fix the second problem, we call “FirstOrDefault”.  This will give us the first object returned, or null (since SalesPerson is a class) if there isn’t anything matching the predicate.

To solve the first problem, we need to create a custom comparer so that our code is comparing values instead of references.

public class SalesComparer : IEqualityComparer<Sales>
    public bool Equals(Sales x, Sales y)
        return (x.Day.Equals(y.Day) && x.Amount.Equals(y.Amount));
    public int GetHashCode(Sales obj)
        return obj.GetHashCode();

Now that we have resolved both of those issues, our final partial LINQ statement looks like this:

var p4 = p3.Select(z => 
     new { 
      Day = z.Day, 
      Name = people
         .Where(p => p
             .Contains(z, new SalesComparer()))
      Amount = z.Amount })

This creates a list of new objects that contain three properties, Day, Name, and Amount.

Putting it all together into one statement yields this:

var sales = people
   .SelectMany(p => p.Sales)
   .GroupBy(p => p.Day, p => p.Amount)
   .Select(r => new Sales {
                Day = r.Key,
                Amount = r.Max()
   .Select(z => new { 
       Day = z.Day, 
       Name = people
          .Where(p => p.Sales
             .Contains(z, new SalesComparer()))
       Amount = z.Amount });

So, there you have it.

Happy Coding!

Managed Windows Shared Hosting by OrcsWeb