Wednesday, May 16, 2018

Review: Lessons from Building Static Analysis Tools at Google

The Communications of the ACM recently had several development articles, and I found the one on static analysis tools at Google particularly interesting.  The article works through how Google went about integrating static analysis tools into every developer's workflow.  And the tools have to be in the workflow, or developers will "forget" to use them.  The second problem with the tools is ensuring that the feedback is useful.  Currently, each dev will mark the items as either useful or incorrect.  If a tool exceeds a 10% false-positive rate, it is temporarily disabled until that tool's developers can fix the flagged issues.  The third issue with the tools is that some are expensive.  Depending on the type of static analysis, the time required may be significant.  Thus the tools are classified into two camps: on each compile, or on each code review / commit.  It is also important that some tools can be temporarily disabled, such that during debugging or refactoring the code may temporarily mutate into an "unsafe" state to simplify the process.

Personally, I am glad that they are integrating analysis tools into the development workflow.  Much work has been done to find bugs and issues within source code, so it is good that these analyses can be utilized regularly to improve code quality.

(As a note, I do not nor never have worked for Google, so I can only write based on the ACM article and not personal experience.)

Wednesday, May 2, 2018

Performance of Atomic Operations on NUMA Systems

It is the end of the semester, so time for posters about student projects.  I visited two sessions so far with three more to go.  I specifically wanted to highlight the results from one poster.

The pair of students wrote a microbenchmark around compare-and-swap, where the value is read, a local update is computed and then compare-and-swap attempts to place the new value into memory iff the old value is present, otherwise fail and retry.  Running the code in tight loop with a thread per hardware context, there is clearly going to be significant contention.  In this scenario, they had two observations from the results:
  1. If the requesting thread is located on the same node as the memory, it will almost always fail.  Implying that accessing NUMA local memory takes a different path than NUMA remote, thereby exhibiting worse performance on contended atomic operations.
  2. The Intel processors had a higher success rate as neighboring threads were more likely to pass along access between each other.  The AMD system did not exhibit this behavior.
Caveats: The precise NUMA topology was not known.  And the AMD processors were several generations older than the Intel processors.

Friday, April 27, 2018

Thesis Defense: Systems Support for Intermittent Computing

Today I attended Alexei Colin's thesis defense titled, Systems Support for Intermittent Computing.

For small, embedded devices, batteries are expensive / difficult, so energy can be harvested from RF, light, temp gradients, motion, et cetera.  In such a device, this direct energy source is insufficient to power the device, so a small capacitor (or other storage medium) retains this energy until the device can be powered for a short time.  The discharge provides an intermittent period of execution before the power source drops below the threshold for execution.  Programs can be annotated with latches or other progress points, such that execution after power failure can then resume at this point after the power is again available.

To model the computation, the program will be decomposed into tasks, where each task can only transfer control to other tasks, but contains arbitrary code.  Tasks will communicate through channels.  The channels provide the memory model, such that any internal updates within the task are ultimately exposed via the channels.  However, this model while reducing the overhead required to execute the tasks, requires a greater quantity of the non-volatile memory.

How do we then get the tasks and thus the latches?  Given a control flow graph (CFG), task boundaries will need to be inserted between specific basic blocks of the graph.  The compiler can be extended to model (or receive model results) of the energy requirements for each block and thereby estimate which path segments will have sufficient energy for complete execution.  Each block actually has not a single energy, but a PDF based on possible microarchitectural effects.  Then the model combines these PDFs to compute the CDF to determine the probability that a given path will successfully execute given a specific amount of energy available.  Note, each task boundary imposes overhead both in time and therefore energy, so we want the set of task boundaries to minimize overhead, while also accounting for task failures wasting energy.  This compiler pass produces better task decompositions than are achieved via manual programmer annotations, as provided by prior work.

Other system support issues.  This system should also have dynamic ability to select the stored energy necessary for task execution.  This change first requires splitting the energy storage device into multiple banks in hardware.  Also, debugging issues in the system is difficult, particularly where the device is expecting to regularly "fail", so a new debugger was prepared that can combine the program state of traditional debuggers, while still supporting the device to be intermittent.  Such devices will also need further design for intermittent networking stacks, and then be built into a larger IoT hierarchy.

In conclusion, energy-harvesting embedded computers will form the edge of the IoT hierarchy.  And the system stack will form the basis for support.

Tuesday, February 27, 2018

Conference Attendance SIGCSE 2018

I have just finished attending SIGCSE 2018 in Baltimore.  In contrast to my earlier conference attendance, this time I have had higher involvement in its execution.

On Wednesday I went to the New Educator's Workshop (NEW).  Even being faculty for two years, there was still a number of things that were either new or good reminders.  Such as including or discussing learning objectives with each lecture and assignment, or being careful with increasing one's level of service.  As a new faculty member, each service request seems exciting, as no one has asked me before!  But many senior faculty emphasized that this is the time in which they are protecting us from lots of service opportunities such that we can spend time on our teaching and research.

On Thursday morning, I presented my recent work that updated a programming assignment in Introduction to Computer Systems, and from which we saw improvements in student exam scores.  We did not research the specific action, and are therefore left with two theories.  First, the improvement could be from using better style in the starter code and emphasizing this style in submissions.  Second, we redesigned the traces to require submissions to address different cases and thereby implement different features.  I lean toward the formed, but have no data driven basis for this hypothesis.

Let's discuss active learning briefly.  I attended (or ran) several sessions focused on this class of techniques.  The basic idea is that students have better engagement and learning by actively participating in class.  There are a variety of techniques that work to help increase student activity.  On Thursday afternoon, Sat Garcia of USD, presented Improving Classroom Preparedness Using Guided Practice, which showed how student learning improved from participating in Peer Instruction, which particularly requires students to come to class prepared.  Shortly later, Cynthia Taylor joined Sat and I in organizing a Bird of Feather (BoF) session on using Active-learning in Systems Courses.  We had about 30-40 attendees there split into two groups discussing some techniques they have used and problems they have observed.  5 years ago, a similar BoF had attendance around 15-20, so we are making progress as a field.

On Friday, I spoke with Brandon Myers who has done work on using POGIL in Computer Organization and Architecture.  In POGIL, students are working in groups of 3-4 with specific roles through a guided learning, guiding students into discovering the concepts themselves.  We had a nice conversation and may be merging our draft resources.  This last point is often the tricky part of using active learning in that developing reasonable materials can be both time intensive and requires several iterations.

The Friday morning keynote presentation was given by Tim Bell, who spoke about K-12.  This topic is rather distant from my own work and research, so I was skeptical.  Yet, I came out quite enthused.  It was interesting to think about presenting Computer Science concepts in non-traditional ways, based initially on having to explain your field at elementary school when the other presenters are a cop and a nurse (his example).  How could you get 6 year olds to sort?  Or see the advantage of binary search as the data grows?

In the afternoon, I was a session chair for the first time.  I moderated the session on Errors, so obviously the AV system stopped working for a short duration.  Beyond that incident, the session seemed to go well.

I always like going to SIGCSE.  It is rejuvenating and exhausting.  So many teachers to speak with about courses, curriculum, and other related topics.  And then you find that you've been social for 16 hours or so hours.

Friday, January 19, 2018

The Importance of Debugging

How do you teach students about debugging?  To have The Debugging Mind-Set?  Can they reason about possible causes of incorrect behavior?

For the past year, I have been revising material to help students learn about using gdb to assist in debugging, which is an improvement over the "printf-based" methods previously.  And while this approach is usually used when the program has crashed from a segfault, many students are stymied when the problem is incorrect behavior rather than invalid behavior.

When their program crashes, they usually appreciate that gdb can show them what line of code / assembly has crashed.  But how can a student "debug" incorrect behavior?  Many try the "instructor" debugging method (they try this too when the code is crashing), where they either present their code or describe the basics of what they have done and ask us, as an oracle, what is wrong.  I try to offer questions that they need to answer about their code.  Sometimes the student follows well and this is valuable guidance for him or her to solve the behavior issue.

Other times I have asked these questions, trying to build up a set of hypotheses to test and investigate, and the student effectively rejects them.  Not for being wrong, but for not clearly being the answer.  They have the subconscious idea that their code is failing for reason X, which was their intuitive guess (these guesses are a good start).  But the idea that they just do not know enough and need to collect more data is not grasped.

You are a doctor when debugging.  Sometimes the patient gives clear symptoms.  And other times, you need to run more tests.  Again, thankfully, usually if an instructor recommends running a certain test, the data gleaned is enough to guide them through the diagnosis.  Students appreciate when this happens; however, there is creativity in considering other possibilities and sometimes that possibility requires being open to everything (see TNG Finale).

This semester I am co-teaching Operating Systems.  We have told the students that you have to know how to debug, as sometimes printf is what you have to debug.  And other times, to quote James Mickens, "I HAVE NO TOOLS BECAUSE I’VE DESTROYED MY TOOLS WITH MY TOOLS."  So in the dystopian, real-world, you need all the debugging tools and techniques you can get.

Wednesday, November 22, 2017

Review: Languages and Code Quality in GitHub

The study is several years old, but was recently reprinted in the Communications of the ACM.  In it, they mined GitHub data for active open source projects, collecting the defect and development rates.  They classified the defects according to their type, and the development language according to their feature.  And they found that language choice matters, marginally.  Some types of bug are far more common, such as memory management bugs in C / C++.  Functional languages have the lowest rate; however, this analysis is only based on the commit history and does not also analyze development time, or differences in programmers.  So the takeaway is that language features do matter, but programmers just write buggy code.

Tuesday, October 24, 2017

PhD Defense - Low-level Concurrent Programming Using the Relaxed Memory Calculus

Today, I went to Michael Sullivan's thesis defense, who passed.  The work was at a delightful intersection of my interests.

We want better (more usable, etc) semantics for low-level operations, those below the std::atomic<> and similar designs.  Perhaps this is achievable with ordering constraints.  Given the following simple example, what constraints are required?

int data, flag;

void send(int msg) {
  data = msg;
  flag = 1;

int recv() {
  while (!flag) continue;
  return data;

Two constraints: data ("visible") before flag, flag ("executed") before data.  These constraints are explicitly programmer-specified, and that it is contended that this is practical.

rmc::atomic<T> - a variable that can be concurrently accessed
L(label, expr) - labels an expression
VEDGE and XEDGE - specify orders between labeled expressions, effectively V is write visibility ordering and X is execution of read ordering
rmc::push() or PEDGE - Pushes have a total order, and provide orderings between reads and writes which is not possible with just V and X.

In more advanced space, do we need to add constraints to spinlock_lock and spinlock_unlock?  Let's add two special labels: pre, post.  These serve for interface boundaries to denote that everything has executed before this point, or is visible.

Next problem is loop iterations.  Do the constraints need to be within a single iteration or constraining every iteration?  Extend the order specifiers, so in the following, the ordering constraint is just within the iteration, whereas the constraint outside the iteration (without "_HERE") is also between the iterations.

for (i = 0; i < 2; i++) {
  VEDGE_HERE(before, after);
  L(before, x = i);
  L(after, y = i + 10);

Code extends LLVM and is on GitHub.  The compiler takes the RMC extensions and puts the appropriate fence instructions into the IR, and then the existing compiler lowers this to assembly.  The compiler uses an SMT solver to determine the minimal set of locations that need the necessary fences (or other instructions).  Then in lowering, the lowering to assembly can better take advantage of the specific constraints required.  Overall, the performance is better than the C++11 model on ARMv7, Power, and comparable on ARMv8.  I suspect that x86's TSO model is not as interesting for finding performance benefits.

Usable / practical - Can Seqlocks Get Along With Programming Language Memory Models?  argues that C++11 would require acquire semantics on unlock.  Here it is stated that RMC is much more straightforward.  Further, students in 15-418 found gains from RMC versus the C11 model.

Other future items include the exploration of whether there are additional consistency instructions that might provide a better nuance for the compiler to inform hardware about required orderings.  Recall that the coarsest grained instruction is the full memory fence.