Elegant C: May 2018

Wednesday, May 16, 2018

Review: Lessons from Building Static Analysis Tools at Google

The Communications of the ACM recently had several development articles, and I found the one on static analysis tools at Google particularly interesting. The article works through how Google went about integrating static analysis tools into every developer's workflow. And the tools have to be in the workflow, or developers will "forget" to use them. The second problem with the tools is ensuring that the feedback is useful. Currently, each dev will mark the items as either useful or incorrect. If a tool exceeds a 10% false-positive rate, it is temporarily disabled until that tool's developers can fix the flagged issues. The third issue with the tools is that some are expensive. Depending on the type of static analysis, the time required may be significant. Thus the tools are classified into two camps: on each compile, or on each code review / commit. It is also important that some tools can be temporarily disabled, such that during debugging or refactoring the code may temporarily mutate into an "unsafe" state to simplify the process.

Personally, I am glad that they are integrating analysis tools into the development workflow. Much work has been done to find bugs and issues within source code, so it is good that these analyses can be utilized regularly to improve code quality.

(As a note, I do not nor never have worked for Google, so I can only write based on the ACM article and not personal experience.)

Wednesday, May 2, 2018

Performance of Atomic Operations on NUMA Systems

It is the end of the semester, so time for posters about student projects. I visited two sessions so far with three more to go. I specifically wanted to highlight the results from one poster.

The pair of students wrote a microbenchmark around compare-and-swap, where the value is read, a local update is computed and then compare-and-swap attempts to place the new value into memory iff the old value is present, otherwise fail and retry. Running the code in tight loop with a thread per hardware context, there is clearly going to be significant contention. In this scenario, they had two observations from the results:

If the requesting thread is located on the same node as the memory, it will almost always fail. Implying that accessing NUMA local memory takes a different path than NUMA remote, thereby exhibiting worse performance on contended atomic operations.
The Intel processors had a higher success rate as neighboring threads were more likely to pass along access between each other. The AMD system did not exhibit this behavior.

Caveats: The precise NUMA topology was not known. And the AMD processors were several generations older than the Intel processors.