Wednesday, March 27, 2013

Transactional Memory and Intel's TSX

It was some years ago that I sat in the audience and heard AMD present a sketch for how transactional memory (TM) would be implemented in the x64 ISA.  More recently a fellow student mentioned that Intel has some new extensions entering the x64 ISA for locks, etc.  I'm always a fan of properly implemented locks, as they so often limit performance and scalability.  So let's dig into Intel's TSX and why I actually want to go buy a gadget when it's released.

A programmer can delineate the transactional section with XBEGIN and XEND instructions.  Within the transactional section, all reads and writes are added to a read- or a write-set accordingly.  The granularity for tracking is a cache line.  If another processor makes a read request to a line in the write-set or either request to a read-set, then the transaction aborts.

Transactions can be semi-nested.  A transaction can only commit if the outer transaction is complete.  Internally nested transactions do not commit on XEND.  If any transaction in the nest aborts, then the entire transaction aborts.  If |XBEGIN| equals |XEND|, then the entire transaction commits and becomes globally visible.  Transactions can be explicitly aborted by the XABORT instruction, which enables the code to abort early when it can determine that the transaction will or should fail.

As I understand it, TSX is being built on top of the existing cache coherence mechanisms.  Each cache line gains an additional bit to indicate if it is part of a transaction.  Each memory operation is treated normally between the processor and the coherence hierarchy with several caveats.  If a dirty, transactional block is evicted, then the transaction fails.  If a dirty, transactional block is demand downgraded from modified to shared or invalid, then the transaction fails.  In this case, a new message would indicate that the request to forward the data fails and the request should be satisfied by memory.

If the transaction commits, then the transactional bits are cleared on each cache line.  And the lines operate normally according to existing cache coherence mechanisms.

Wrapping this up, TSX is an ISA extension that almost every program can take advantage of and therefore has an appeal toward conducting personal testing, just like building my own x64 machine back in 2005.

Monday, March 4, 2013

Maps and Red-Black Trees

When cooperating on a prior work of research, Brainy: effective selection of data structures, I learned about the cost of selecting particular data structures.  I expect that every Computer Scientist knows the general cost of using standard data structures like trees and linked lists.  Now, the extension that Brainy provided was to recognize that a "tree" could have different implementations and these implementations can have different costs for a given workload.  I, and I expect others, learned about AVL, splay, red-black and other tree types.

All this is good until your type hides its implementation and you reason about it in abstract.  A map is commonly misinterpreted by its implementation.  A map is a key-value store, where a "key" has an associated "value".  An address book can be understood as a map of name to address / phone number / etc.

Now, the map is assumed to be implemented as a hash table, to give O(1) look-up cost.  However in the C++ standard library, map uses a red-black tree for O(log n).  Before you go and request that the libraries be changed, understand that experimental measurements suggest that the red-black implementation wins out when n > 500, as the hash collisions dominate the idealized O(1) cost.  This is the problem that Brainy attempts to solve: the programmer selects the functionality, e.g., map and Brainy selects the best implementation given your program and architecture.

So go ahead and use a map, but recognize that you may not have O(1) cost and definitely won't as n grows increasingly large.