Elegant C: non-volatile memory

Thursday, September 12, 2019

Thesis Proposal: Theoretical Foundations for Modern Multiprocessor Hardware

Naama Ben-David gave her proposal this morning on Theoretical Foundations for Modern Multiprocessor Hardware.

Is there a theoretical foundation for why exponential backoff is a good design? Exponential backoff is a practically developed algorithm that 0.

To develop such a foundation, we need to a model of time; however, requests are asynchronous and not according to a single time source. To address this, model time with adversarial scheduling. Thus when performing a request, there are three sources of delay:

self-delay: backoff, sleep, local computation
system-delay: interrupts, context switches
contention-delay: delay caused by contention

Given this model, the adversary can, to a limited degree, decide when requests that an entity's request have passed from self-delay into the system delay can then move to contention-delay and ultimately be completed.

In BBlelloch'17, this model was applied and the work measured for different approaches.

With no backoff, there is omega(n³) work.
Exp backoff reduces to theta(n² log n) bound on work
The paper also proposes a new algorithm that has high probability of O(n²)

The second phase of work is developing simple and efficient algorithms for systems that have non-volatile memory (NVRAM). With NVRAM, on a crash or system failure, the contents in memory persist across reboot (or other restore). This permits the system to restore the running program(s) to a finer degree than happens from auto-saves or other current techniques. However, systems also have caches, which are not persistent. Caches are presently managed by hardware and make decisions as to when to write contents back to memory. Algorithms must work with the caches to ensure that results are safely in memory at selected points of execution. There are a variety of approaches for how to select these points.

The third phase of work is modeling RDMA (remote direct memory access) systems. Can there be a model of the different parts of such a system: memory, NIC (network interface card), and CPU? Then explore the contention as well as possible failures in the system.

One scheme is for every processes to also be able to send messages on behalf of its shared memory neighbors, so that even if a process fails, its ability to participate in algorithms, such as consensus, is still possible.

Being a proposal, ongoing work will work on instantiations of these algorithms to measure the practical performance.

Thursday, March 7, 2019

Talk: Concurrent Data Structures for Non-Volatile Memory

Today, Michal Friedman, gave a talk on Concurrent Data Structures for Non-Volatile Memory.

Future systems will contain non-volatile memory. This is memory that exhibits normal DRAM characteristics, but can maintain its contents even across power failures. In current systems, caches update memory on either evictions and flushes. Flushes, however, impose overhead due to the memory access time and overriding the write-back nature of most caches.

Linearizability is one definition for concurrency governing the observation of the operations. This can be extended to durable linearizability being on a durable system, such that data is flushed before global visibility (initialization), flush prior operations (dependence), and persist operations before they complete (completion). But a further extension is required to know when a sequence of operations are complete, beyond just taking snapshots of the memory state.

Relaxed, durable, and log versions of lock-free queue that extend Michael and Scott's baseline queue implementation. Each version provides stronger guarantees: relaxed are the existing augmented with a sync operation to snapshot state, durable preserves the data structure across failures, log identifies the specific state. The main guarantee is that the data structure will be consistent for any set of thread crashes, which is stronger than the lock-free guarantee.

We do this by extending the prior lock-free versions that include memory flushes of key state, and that later update which see volatile state will flush that state before completing their operations. This meets the durable linearizability. And can be extended by also have a log of operations that are updated and maintained before the operations themselves execute. These logs are per-thread, so as to be unordered and to be individually stateful.

The relaxed version implements sync by creating a special object that indicates a snapshot is occurring. If other concurrent operations find this object, they take over the snapshot and continue persisting the state before completing its own operation. Thus a snapshot does not block other operations, but still occurs at that point in the sequence of operations.

Based on performance measurements, the relaxed performs similar to the baseline implementation, while the durable and log-based implementations run slower than the relaxed but with similar performance.

Finally, TSO provides us a guarantee that the stores will reach the cache line in a desired order and not require flushing between writes.