Elegant C: Performance of Atomic Operations on NUMA Systems

Wednesday, May 2, 2018

Performance of Atomic Operations on NUMA Systems

It is the end of the semester, so time for posters about student projects. I visited two sessions so far with three more to go. I specifically wanted to highlight the results from one poster.

The pair of students wrote a microbenchmark around compare-and-swap, where the value is read, a local update is computed and then compare-and-swap attempts to place the new value into memory iff the old value is present, otherwise fail and retry. Running the code in tight loop with a thread per hardware context, there is clearly going to be significant contention. In this scenario, they had two observations from the results:

If the requesting thread is located on the same node as the memory, it will almost always fail. Implying that accessing NUMA local memory takes a different path than NUMA remote, thereby exhibiting worse performance on contended atomic operations.
The Intel processors had a higher success rate as neighboring threads were more likely to pass along access between each other. The AMD system did not exhibit this behavior.

Caveats: The precise NUMA topology was not known. And the AMD processors were several generations older than the Intel processors.

Wednesday, May 2, 2018

Performance of Atomic Operations on NUMA Systems

No comments: