Monday, March 4, 2013

Maps and Red-Black Trees

When cooperating on a prior work of research, Brainy: effective selection of data structures, I learned about the cost of selecting particular data structures.  I expect that every Computer Scientist knows the general cost of using standard data structures like trees and linked lists.  Now, the extension that Brainy provided was to recognize that a "tree" could have different implementations and these implementations can have different costs for a given workload.  I, and I expect others, learned about AVL, splay, red-black and other tree types.

All this is good until your type hides its implementation and you reason about it in abstract.  A map is commonly misinterpreted by its implementation.  A map is a key-value store, where a "key" has an associated "value".  An address book can be understood as a map of name to address / phone number / etc.

Now, the map is assumed to be implemented as a hash table, to give O(1) look-up cost.  However in the C++ standard library, map uses a red-black tree for O(log n).  Before you go and request that the libraries be changed, understand that experimental measurements suggest that the red-black implementation wins out when n > 500, as the hash collisions dominate the idealized O(1) cost.  This is the problem that Brainy attempts to solve: the programmer selects the functionality, e.g., map and Brainy selects the best implementation given your program and architecture.

So go ahead and use a map, but recognize that you may not have O(1) cost and definitely won't as n grows increasingly large.

No comments: