Elegant C: dynamic libraries

Wednesday, April 22, 2020

Thesis Proposal - Lightweight Preemptable Functions

Sol Boucher presented his thesis proposal today via Zoom. It was well attended and I think many of us are already looking forward to his defense.

Lightweight Preemptable Functions (LPF)
Function calls require some knowledge of the cost for a program to know whether it can invoke it or not in other time critical execution.

In the current space of multitasking, three approaches will be first reviewed: futures, threads, and processes. Futures rely on the runtime to provide asynchronous execution, but preemption requires yield points, and not all execution can support these points, nor can it be clear what this future will do. With threads, execution is fairly independent, but minimal information is provided toward scheduling and cancellation is hard. Processes provide good support for cancellation, except that on fork(), only one thread is required to be carried over in the new process. The other threads are canceled, which can result in inconsistent state.

The thesis proposal is: Introducing a novel abstraction for preemption at the granularity of a synchronous function call, which includes a timeout.

launch(function, timeout, argument) returning a continuation. If the timeout elapses, the continuation is returned, but it is then the choice of the caller for whether to resume it or cancel. This LPF executes within the same thread context as the caller, thereby reducing overhead. However, to resume the LPF, it will need a new stack. To support the timeout, it relies on a timer signal that can occur every ~5 microseconds. Launch / resume have overhead comparable to this, significantly better than fork or pthread_create. However, cancel is extremely expensive.

LPFs also have an issue with calling functions that are non-reentrant, similar to the rules governing signal handlers. To address this, the runtime provides selective relinking to capture what the LPF is calling via the global offset table (GOT). Some GOT entries point to dynamic libraries, other entries are initially pointing to the dynamic linker. This runtime support also needs to intercept thread local variables. This interception support imposes about 10ns of overhead, which is little above the cost of function calls themselves.

Microservice approaches have significant latency, often tends to hundreds of microseconds. Primarily the requirement to create a sufficient container, often via processes or virtualization. If the microservice was instead written safely and using LPFs, then the latency could be reduced toward the hardware bound as measured by communicating between VMs or committing transactions.

Cancellation cleanup is difficult in languages, such as C, that require explicit cleanup. In other languages, adding a new exception path for timeout and cancellation could then invoke the necessary destructors. Nonetheless, this can be expensive (perhaps single milliseconds).

Other possible future work:
Another cancellation step is the cost of unloading the mirrored library, so could the runtime instead track the changes made and then determine whether to roll back or discard.
Is it possible to reduce the overhead of the preemption signals or improving their granularity.

Monday, March 13, 2017

Book Review: Optimized C++: Proven Techniques for Heightened Performance

I have spent significant time on performance issues and have been in search of a book that can summarize the diversity of issues and techniques well. I hoped that Optimized C++: Proven Techniques for Heightened Performance would provide some of the guidance I want and
This book is not quite it. There is good material here, yet I found repeatedly thinking that the author was not aware of the past 10(?) years of changes to the field. Not an issue of the book was from the early 2000s, but it was published last year.

A key step in improving the performance of programs is measuring it. There are a variety of techniques for doing so. Tools based on instrumentation and tools based on sampling profiling. I find greater value to using the sampling profiling tools (for measuring performance) due to their lower overhead and ability to pinpoint where in a function this cost exists. Yet the book's focus is limited to gprof-esque approaches. I tell students that this approach is best with deep call trees, which may be a greater issue for C++ programming specifically.

The author is somewhat dismissive to compiler optimizations and emphasizes that his observed benefit has been particularly limited to function inlining. There are many more optimizations, and you should care about them. But again, I wonder if his experience of C++ has been deep call trees that could particularly benefit from inlining.

In a take it or leave it, this work also discourages the use of dynamic libraries. Yes, they impose a performance penalty, but they also provide valuable functionality. It all depends on your use case for whether you should statically or dynamically link your code. Code that is reused by separate executables should be in a dynamic library, as it reduces the memory requirements when running and reduces the effort to patch and update those executables. Components that are only used by a single executable should be statically linked, unless the components are of significant size such that decoupling can still benefit memory usage and the updating process.

The author related that replacing printf with puts to just print a string has performance advantages, due to printf being a complicated "God function". The basic point is valid that printf has significant functionality; however, the anecdote should be taken with a grain of salt. Current compilers will do this optimization (replace printf with puts) automatically.

While most of the work provides small examples, the final chapters on concurrency (?) and memory management do not. The concurrency chapter reads as a reference book, as it lists the various APIs available and what each does. It would be better for the book to assume that the readers are familiar with these calls (as the author does with many other topics) and discuss possible optimizations within this scope.

To conclude, the book is not bad, but I also cannot say it is accurate on every point. Especially with performance, programmers are apt to make prompt design decisions based on "their experience" or "recent publications". Measure your code's performance. Only then can you discern which techniques will provide value.