Creating Collection Indexes (Look-ups)


I would imagine that every developer out there has worked, or will soon work, with collections of objects. I would also imagine that the technique I'm going to go over isn't anything new to most folks. However, I was going over some of my JavaScript code with a more associate-level developer the other day, and this concept was entirely new to him and seemed to have made a decent impact on him and how he approaches collections now. So, I thought that if it helped one person, it may help another! If it does, that's all that matters to me :)


The Scenario

Let's say we're building an application that gets data from an API over HTTP (i.e. a RESTful service). We're building a UI that is responsible for displaying information to the user based on the data the API sends us. Let's take a classical example of an order and a customer for our object types. Here's what the API gives when we submit a GET request for each:

// GET /some/api/orders returns:
[
    // one order in the collection
    {
        id: 12345,
        date: '01/24/2014',
        customer_id: 1,
        items: [
            {
                item_id: 21,
                quantity: 2
               }
        ],
        subtotal: 150.00
        total: 164.73
    }, //many orders follow...
]

// GET /som/api/customers returns: 
[
    // one customer in the collection
    {
        id: 21,
        first_name: 'Jill',
        last_name: 'Jones',
        shipping_address: '123 N. Sweet St. Nashville, TN 37211'
    }, //many customers follow...
]

Now, let's say that part of our web app was to show a list of recent orders and the fields we wanted to display were order.date, order.total, customer.first_name, and customer.last_name. As you can see, the API is returning reference fields for things like the customer that placed the order, so we have to essentially join those items together ourselves :)


One Approach - Nested Iterators

One common approach is to iterate through all orders by using the Array.map function and populate a customerName property onto each order by finding the customer by the referenced order.customer_id field. We can do this with the Array.filter method on the customer collection - but make no mistake, this is a nested iterator. This is easy, and with small collections may be quick enough.
Note: The assumption here is that each object has a unique ID - which is pretty standard.

var allOrders, allCustomers;

//assume the allCustomers and allOrders variables were populated with 
//    the response from the API - leaving out promises/callbacks for succinctness...
allCustomers = customerSvc.getListFromAPI();  
allOrders = ordersSvc.getListFromAPI();

//populate the Customer object onto each Order
allOrders = allOrders.map(function populateCustomer(order){

    var customerMatches = allCustomers.filter(function findById(customer){
        return customer.id === order.customer_id;
    });

    if(customerMatches.length > 0) {
        //get the first match (should only be 1 b/c of unique ID fields)
        //and populate its onto the order
        var match = customerMatches[0];
        order.customerName = match.first_name + ' ' + match.last_name;
    } else {
        //somehow have an unknown customer
        order.customerName = 'UNKNOWN CUSTOMER!!!!';
    }

    //return the modified order object
    return order;
});

And that works. However, look at what we're doing - for every single item in a collection, we're iterating through the items in a separate collection to find a match! This is a big deal and with even moderately-sized collections it can add up to huge impacts to performance!


What's The Big Deal?

Let's say that we have 100 unique customer records, and a total of 2500 unique order records. With the nested iterator approach, we start off iterating over all order records. That automatically puts our count of total iterations up to 2500. Inside of that initial iterator, we are then looking to see if any records in the customer collection have a matching id. So, we iterate over 100 customer records in each iteration! That's a ton of work! Now, our total count becomes 2500 + (2500 * 100), or 252,500 total iterations!!


A Better Way

Since we already know that each customer has a unique id, why not create a pivoted view of the collection that allows us to more easily find each customer by their id without having to filter the collection each time? Here's how easy this is:

//create a quick object that will hold our pivoted view
var _customersById = {};

//iterate *once* over the full customer list to populate the pivoted view 
//    (or index, look-up, whatever you prefer to call this :)
allCustomers.forEach(function populatePivotView(customer){  
    _customersById[customer.id] = customer;
});

//So simple, but what we've essentially created is something like this.
//Notice how the objects *property* is the same as its value's *id* field!
/*
{
    21: {id: 21, first_name: 'Bob', last_name: 'Smith', ...},
    354: {id: 354, first_name: 'Rachel', last_name: 'Rayne', ...},
    86: {id: 86, first_name: 'Jon', last_name: 'Jones', ...},
    215: {id: 215, first_name: 'Ashleigh', last_name: 'Riggs', ...}
}
*/

Note: With JavaScript, you can dynamically add properties like this, and object property names can be numbers. This technique may need to be adjusted to fit other languages :)

Perfect - we now have an object that makes it super easy to find a customer by its id field. Now, let's use this look-up object to help us populate the customerName property onto each order object!

//remember, we now have allOrders, allCustomers, and _customersById variables populated

//still going to use .map to populate the field, 
//    as it's a nice clean way of modifying objects in a collection
allOrders = allOrders.map(function populateCustomerName(order){  
    //find the customer by the customer_id field on the order object
    var customer = _customersById[order.customer_id];

    if(customer) {
        order.customerName = customer.first_name + ' ' + customer.last_name;
    } else {
        order.customerName = 'UNKNOWN CUSTOMER!!!!';
    }
    //return the modified object
    return order;
});

And that's going to end up giving us the same result-set, but will actually be a whole lot less work for the CPU. Here's how our iteration counts break down now: 100 for when we iterated over the customer collection to create the _customersById look-up object. 2500 for when we iterated over the order collection to populate the customerName property. That's it - no nested iterations here! So, our grand total is 100 + 2500 = 2600. That's a huge difference!

So, How Do You Do It?

As I said earlier - this technique isn't profound or earth-shattering in any way. However, it is also not something they cover in school so may be helpful to someone, somewhere :)

This is just one of many different techniques for this type of data manipulation. Have a way you prefer over a look-up object, or a way that is better? Please don't hesitate to share your way in the comments - I'm always up for learning new techniques ;-)

Loading Google+ Comments ...