Elegant C: June 2015

Wednesday, June 17, 2015

Conference Attendance FCRC - Day 5 - Plenary Summary

Plenary Talk today, which pulls together all of the conference attendees. Sunday's talk was based in databases, with Michael Stonebraker speaking on his Turing-award winning work. Monday's talk discussed interdisciplinary work, primarily centered in CS theory, and was given by Andrew Yao (a prior Turing Award winner). On Tuesday, Olivier Temam discussed neural networks in hardware, which focused on his work and efforts to better model or mimic the capabilities of the brain.

The F# Path to Relaxation -
There are opportunities to introduce new work toward relaxing and improving. Or perhaps create opposing camps. Thesis <-> Antithesis ==> synthesis. Or Functional <=> Interop. Back in 2003, functional languages were isolated, non-interoperable, using their own VMs. F# (along with Scala, Swift, ...) instead seeks to have an exosystem, being the external industry-standard runtimes. Another tension is between Enterprise and Openness. So F# is open and cross-platform. Tools are available for Android and iOS, as well as packages for Linux.

Functional <=> Objects
Thus embrace objects, without being object-oriented. Some cases in the cross-product of the expected features for objects and functions requires particular care for synthesis.

Circularities and Modularity in the Wild
Lambdas, generics, etc are clearly being embraced in modern language design. However, circular type dependencies are unfortunately also widely present. Languages need to enforce acyclicity.

Pattern Matching <=> Abstraction
How does the language support the functional concept of pattern matching, when you want to include type abstraction? Alas, the speaker skipped the solution quickly.

Code <=> Data
Most development is to providing tools for the information revolution. There is exponential growth in Open APIs for accessing data from the internet. This data then comes with dynamic types, where the types are only known once the data (or schema) has been accessed. The type creation can also enable blending code for other languages into the F# environment. For example, the support can allow opening csv or json files and having types for the data. This feature is, by far, the most exciting and interesting of the presentation. Not quite worth the price of admission, but clearly a great development.

Applied PL design comes from the synthesis at the heart of these contradictions. This tension also is part of the proliferation of languages.

Conference Attendance FCRC - Day 4 - PLDI

PLDI starts off this morning with Concurrency. As a student volunteer, I worked this session and was limited as to what I could note about the content itself.

Composing Concurrency Control - Introducing more diverse and finer-grained locking mechanisms. The tool works to develop a locking strategy that will guarantee serializability, abort-safety, opacity, and deadlock-freedom. It particularly works to integrate both locking schemes as well as transactional memory.

In the afternoon, I can dive into the semantics of the C language.

A Formal C Memory Model Supporting Integer-Pointer Casts - What optimizations are possible in the presence of pointers, pointer arithmetic, and integer-pointer casts? For example, can constants be propagated or is their location potentially targetable by a pointer? Other optimizations are explored in their paper. In practice, as code can generate arbitrary addresses, how can the compiler reason about any specific location in memory.

Defining the Undefinedness of C - Extending their prior work that gave semantics to defined behavior of C programs, which required doubling the rules to describe the semantic behavior. Fundamentally, any instance of undefined behavior that will be definitely encountered in an execution will invalidate that execution. For example, dividing by zero after a printf is valid to crash before the printf. The following code example is also undefined.

return (x = 1) + (x = 2);

Many of these cases are dependent on runtime behavior, and therefore a tool that can help identify them is valuable.

Monday, June 15, 2015

Conference Attendance FCRC - Day 3 - PLDI / ISCA

PLDI itself began this morning and after the welcome, we had three distinguished papers. I am excited that two of these works focused on code performance and compilers, rather than higher-level programming language issues:

Automatically Improving the Accuracy of Floating Point Expressions - How do you address rounding error in your code? Use formal numeric methods an expert can reduce the errors. But rather than be an expert, they wrote a tool to use heuristics to apply these methods. For example, what error do you have when evaluating the quadratic formula. Based on just the value for b, there are different expressions that have much lower error.

The tool, Herbie, estimates the accuracy of the expression and then attempts to use algebraic transformations (from a database of 120 rules). Having generated many candidate expressions, the tool then selects using dynamic programming an appropriate set of expressions across the input space. First, it matches the example cases from the Hamming's Numeric Methods book. And furthermore has found bugs in existing projects.

Diagnosing Type Errors with Class - SHErrLoc works to identify the likely cause of type errors. Expressions are given constraints. These constraints form a graph, which is analyzed for failing paths in the graph. The tool then attempts to localize the failure and identify the minimal change to the constraints to satisfy the graph. Even though it is not Haskell specific, it is more accurate at detecting type errors in Haskell programs than related work.

Provably Correct Peephole Optimizations with Alive - Compilers are buggy. For example, LLVM's InstCombine is an LLVM pass that exploits the LLVM IR to improve performance, which contains many hand-rolled transformations. Propose a DSL that describes Peephole Optimizations, where the the DSL is basically a simplified LLVM IR annotated with preconditions for the transformation. Then the expression describing the transformation is passed through constraint checkers to verify it is correct. And then generate C++ code for that transformation.

Correctness of the expression must not introduce new undefined behaviors, still produces the same result, and properly updates the memory state. Initially proved the optimizations in InstCombine correct or identified bugs, and eventually could replace the pass with the generated version. Furthermore, Alive was able to strengthen the post-conditions for many instructions (for example, identifying whether an operation will overflow).

In the afternoon, I was visiting the other side of the conference center with ISCA. One paper of note there:

A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing - They showed that (somehow, as I missed this point) integrating the simple core closer to the memory system, they pushed past the memory bandwidth of HMC (640GB/s) and instead to about 2.3TB/s. They focused on two pieces: an updated programming model for graphs and a prefetching system within their cores. The model introduced async remote procedure calls that are sent to the Tesseract core near the data. These messages accumulate in a queue until either a barrier or the queue is full. While they accumulate, the prefetcher is requesting the appropriate data so when the function fires, the data is available. The prefetcher is able to operate on the two separate streams: the local processing that is sequential and generating the remote requests, and then the remote requests received at this node.

Sunday, June 14, 2015

Conference Attendance FCRC - Day 1 - WCAE / SPAA

In Portland for the next 5 days attending the Federated Computing Research Conference, which is a vast co-location of the top ACM conferences. For my part, this includes ISCA and PLDI. Following registration and checking in as a student volunteer, I ducked in to the Workshop on Computer Architecture Education (WCAE). There were a couple of presentations on different tools being used to teach architectural concepts.

Following the morning break, it was time for the keynote for SPAA, given by Hans-J Boehm, titled, "Myths and Misconceptions about Threads". For example,

#include "foo"
f() {
foo_t x, a;
...
x = a; // Is this atomic?
}

Which lead to the discussion of 'is assignment atomic?' and the audience tossed out increasing complex examples of how it is not. Fundamentally, the programming model is becoming "data-race free", and the specifications can treat races as "undefined behavior". In general, a sequential program will view its execution following the sequential consistency model, even if the hardware is executing the code with a weaker model.

What then should the programming language provide for the atomics / synchronization? Recall that the compiler has considerable flexibility for emitting the final program. With data-race free code, the compiler is treating anything that is not an atomic as part of sequential code and therefore subject to any reordering that would still be valid sequentially. The following example is how this can go awry. X is a global, and the compiler could substitute x anyplace tmp is, because the model assumes "there are no races on x". And if the program does happen to modify x is a racy manner, then the behavior is undefined.

bool tmp = x;
if (tmp) f = new ...
...
if (tmp) f->foo();

Gah! The programmer wanted to take a snapshot of the global value, but ended up with a different result. So the atomics are becoming more than just the "hacker's" way to quickly update shared values, and instead can be seen as annotations to the compiler to clearly encapsulate the shared state. This means the type of x is not bool, but atomic<bool>. Then the compiler knows the programmer's (likely) intent of this code. And this then rolls back to a deeper question of my research, "What could the system do more efficiently if it knew more about the programmer's intent?"