Thursday, March 7, 2019

Talk: Concurrent Data Structures for Non-Volatile Memory

Today, Michal Friedman, gave a talk on Concurrent Data Structures for Non-Volatile Memory.

Future systems will contain non-volatile memory.  This is memory that exhibits normal DRAM characteristics, but can maintain its contents even across power failures.  In current systems, caches update memory on either evictions and flushes.  Flushes, however, impose overhead due to the memory access time and overriding the write-back nature of most caches.

Linearizability is one definition for concurrency governing the observation of the operations.  This can be extended to durable linearizability being on a durable system, such that data is flushed before global visibility (initialization), flush prior operations (dependence), and persist operations before they complete (completion).  But a further extension is required to know when a sequence of operations are complete, beyond just taking snapshots of the memory state.

Relaxed, durable, and log versions of lock-free queue that extend Michael and Scott's baseline queue implementation.  Each version provides stronger guarantees: relaxed are the existing augmented with a sync operation to snapshot state, durable preserves the data structure across failures, log identifies the specific state.  The main guarantee is that the data structure will be consistent for any set of thread crashes, which is stronger than the lock-free guarantee.

We do this by extending the prior lock-free versions that include memory flushes of key state, and that later update which see volatile state will flush that state before completing their operations.  This meets the durable linearizability.  And can be extended by also have a log of operations that are updated and maintained before the operations themselves execute.  These logs are per-thread, so as to be unordered and to be individually stateful.

The relaxed version implements sync by creating a special object that indicates a snapshot is occurring.  If other concurrent operations find this object, they take over the snapshot and continue persisting the state before completing its own operation.  Thus a snapshot does not block other operations, but still occurs at that point in the sequence of operations.

Based on performance measurements, the relaxed performs similar to the baseline implementation, while the durable and log-based implementations run slower than the relaxed but with similar performance.

Finally, TSO provides us a guarantee that the stores will reach the cache line in a desired order and not require flushing between writes.

No comments: