Monday, June 15, 2015

Conference Attendance FCRC - Day 3 - PLDI / ISCA

PLDI itself began this morning and after the welcome, we had three distinguished papers.  I am excited that two of these works focused on code performance and compilers, rather than higher-level programming language issues:

Automatically Improving the Accuracy of Floating Point Expressions - How do you address rounding error in your code?  Use formal numeric methods an expert can reduce the errors.  But rather than be an expert, they wrote a tool to use heuristics to apply these methods.  For example, what error do you have when evaluating the quadratic formula.  Based on just the value for b, there are different expressions that have much lower error.

The tool, Herbie, estimates the accuracy of the expression and then attempts to use algebraic transformations (from a database of 120 rules).  Having generated many candidate expressions, the tool then selects using dynamic programming an appropriate set of expressions across the input space.  First, it matches the example cases from the Hamming's Numeric Methods book.  And furthermore has found bugs in existing projects.

Diagnosing Type Errors with Class - SHErrLoc works to identify the likely cause of type errors.  Expressions are given constraints.  These constraints form a graph, which is analyzed for failing paths in the graph.  The tool then attempts to localize the failure and identify the minimal change to the constraints to satisfy the graph.  Even though it is not Haskell specific, it is more accurate at detecting type errors in Haskell programs than related work.

Provably Correct Peephole Optimizations with Alive - Compilers are buggy.  For example, LLVM's InstCombine is an LLVM pass that exploits the LLVM IR to improve performance, which contains many hand-rolled transformations.  Propose a DSL that describes Peephole Optimizations, where the the DSL is basically a simplified LLVM IR annotated with preconditions for the transformation.  Then the expression describing the transformation is passed through constraint checkers to verify it is correct.  And then generate C++ code for that transformation.

Correctness of the expression must not introduce new undefined behaviors, still produces the same result, and properly updates the memory state.  Initially proved the optimizations in InstCombine correct or identified bugs, and eventually could replace the pass with the generated version.  Furthermore, Alive was able to strengthen the post-conditions for many instructions (for example, identifying whether an operation will overflow).

In the afternoon, I was visiting the other side of the conference center with ISCA.  One paper of note there:

A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing - They showed that (somehow, as I missed this point) integrating the simple core closer to the memory system, they pushed past the memory bandwidth of HMC (640GB/s) and instead to about 2.3TB/s. They focused on two pieces: an updated programming model for graphs and a prefetching system within their cores. The model introduced async remote procedure calls that are sent to the Tesseract core near the data. These messages accumulate in a queue until either a barrier or the queue is full. While they accumulate, the prefetcher is requesting the appropriate data so when the function fires, the data is available. The prefetcher is able to operate on the two separate streams: the local processing that is sequential and generating the remote requests, and then the remote requests received at this node.

No comments: