Tuesday, February 27, 2018

Conference Attendance SIGCSE 2018

I have just finished attending SIGCSE 2018 in Baltimore.  In contrast to my earlier conference attendance, this time I have had higher involvement in its execution.

On Wednesday I went to the New Educator's Workshop (NEW).  Even being faculty for two years, there was still a number of things that were either new or good reminders.  Such as including or discussing learning objectives with each lecture and assignment, or being careful with increasing one's level of service.  As a new faculty member, each service request seems exciting, as no one has asked me before!  But many senior faculty emphasized that this is the time in which they are protecting us from lots of service opportunities such that we can spend time on our teaching and research.

On Thursday morning, I presented my recent work that updated a programming assignment in Introduction to Computer Systems, and from which we saw improvements in student exam scores.  We did not research the specific action, and are therefore left with two theories.  First, the improvement could be from using better style in the starter code and emphasizing this style in submissions.  Second, we redesigned the traces to require submissions to address different cases and thereby implement different features.  I lean toward the formed, but have no data driven basis for this hypothesis.

Let's discuss active learning briefly.  I attended (or ran) several sessions focused on this class of techniques.  The basic idea is that students have better engagement and learning by actively participating in class.  There are a variety of techniques that work to help increase student activity.  On Thursday afternoon, Sat Garcia of USD, presented Improving Classroom Preparedness Using Guided Practice, which showed how student learning improved from participating in Peer Instruction, which particularly requires students to come to class prepared.  Shortly later, Cynthia Taylor joined Sat and I in organizing a Bird of Feather (BoF) session on using Active-learning in Systems Courses.  We had about 30-40 attendees there split into two groups discussing some techniques they have used and problems they have observed.  5 years ago, a similar BoF had attendance around 15-20, so we are making progress as a field.

On Friday, I spoke with Brandon Myers who has done work on using POGIL in Computer Organization and Architecture.  In POGIL, students are working in groups of 3-4 with specific roles through a guided learning, guiding students into discovering the concepts themselves.  We had a nice conversation and may be merging our draft resources.  This last point is often the tricky part of using active learning in that developing reasonable materials can be both time intensive and requires several iterations.

The Friday morning keynote presentation was given by Tim Bell, who spoke about K-12.  This topic is rather distant from my own work and research, so I was skeptical.  Yet, I came out quite enthused.  It was interesting to think about presenting Computer Science concepts in non-traditional ways, based initially on having to explain your field at elementary school when the other presenters are a cop and a nurse (his example).  How could you get 6 year olds to sort?  Or see the advantage of binary search as the data grows?

In the afternoon, I was a session chair for the first time.  I moderated the session on Errors, so obviously the AV system stopped working for a short duration.  Beyond that incident, the session seemed to go well.

I always like going to SIGCSE.  It is rejuvenating and exhausting.  So many teachers to speak with about courses, curriculum, and other related topics.  And then you find that you've been social for 16 hours or so hours.

Friday, January 19, 2018

The Importance of Debugging

How do you teach students about debugging?  To have The Debugging Mind-Set?  Can they reason about possible causes of incorrect behavior?

For the past year, I have been revising material to help students learn about using gdb to assist in debugging, which is an improvement over the "printf-based" methods previously.  And while this approach is usually used when the program has crashed from a segfault, many students are stymied when the problem is incorrect behavior rather than invalid behavior.

When their program crashes, they usually appreciate that gdb can show them what line of code / assembly has crashed.  But how can a student "debug" incorrect behavior?  Many try the "instructor" debugging method (they try this too when the code is crashing), where they either present their code or describe the basics of what they have done and ask us, as an oracle, what is wrong.  I try to offer questions that they need to answer about their code.  Sometimes the student follows well and this is valuable guidance for him or her to solve the behavior issue.

Other times I have asked these questions, trying to build up a set of hypotheses to test and investigate, and the student effectively rejects them.  Not for being wrong, but for not clearly being the answer.  They have the subconscious idea that their code is failing for reason X, which was their intuitive guess (these guesses are a good start).  But the idea that they just do not know enough and need to collect more data is not grasped.

You are a doctor when debugging.  Sometimes the patient gives clear symptoms.  And other times, you need to run more tests.  Again, thankfully, usually if an instructor recommends running a certain test, the data gleaned is enough to guide them through the diagnosis.  Students appreciate when this happens; however, there is creativity in considering other possibilities and sometimes that possibility requires being open to everything (see TNG Finale).

This semester I am co-teaching Operating Systems.  We have told the students that you have to know how to debug, as sometimes printf is what you have to debug.  And other times, to quote James Mickens, "I HAVE NO TOOLS BECAUSE I’VE DESTROYED MY TOOLS WITH MY TOOLS."  So in the dystopian, real-world, you need all the debugging tools and techniques you can get.

Wednesday, November 22, 2017

Review: Languages and Code Quality in GitHub

The study is several years old, but was recently reprinted in the Communications of the ACM.  In it, they mined GitHub data for active open source projects, collecting the defect and development rates.  They classified the defects according to their type, and the development language according to their feature.  And they found that language choice matters, marginally.  Some types of bug are far more common, such as memory management bugs in C / C++.  Functional languages have the lowest rate; however, this analysis is only based on the commit history and does not also analyze development time, or differences in programmers.  So the takeaway is that language features do matter, but programmers just write buggy code.

Tuesday, October 24, 2017

PhD Defense - Low-level Concurrent Programming Using the Relaxed Memory Calculus

Today, I went to Michael Sullivan's thesis defense, who passed.  The work was at a delightful intersection of my interests.

We want better (more usable, etc) semantics for low-level operations, those below the std::atomic<> and similar designs.  Perhaps this is achievable with ordering constraints.  Given the following simple example, what constraints are required?

int data, flag;

void send(int msg) {
  data = msg;
  flag = 1;

int recv() {
  while (!flag) continue;
  return data;

Two constraints: data ("visible") before flag, flag ("executed") before data.  These constraints are explicitly programmer-specified, and that it is contended that this is practical.

rmc::atomic<T> - a variable that can be concurrently accessed
L(label, expr) - labels an expression
VEDGE and XEDGE - specify orders between labeled expressions, effectively V is write visibility ordering and X is execution of read ordering
rmc::push() or PEDGE - Pushes have a total order, and provide orderings between reads and writes which is not possible with just V and X.

In more advanced space, do we need to add constraints to spinlock_lock and spinlock_unlock?  Let's add two special labels: pre, post.  These serve for interface boundaries to denote that everything has executed before this point, or is visible.

Next problem is loop iterations.  Do the constraints need to be within a single iteration or constraining every iteration?  Extend the order specifiers, so in the following, the ordering constraint is just within the iteration, whereas the constraint outside the iteration (without "_HERE") is also between the iterations.

for (i = 0; i < 2; i++) {
  VEDGE_HERE(before, after);
  L(before, x = i);
  L(after, y = i + 10);

Code extends LLVM and is on GitHub.  The compiler takes the RMC extensions and puts the appropriate fence instructions into the IR, and then the existing compiler lowers this to assembly.  The compiler uses an SMT solver to determine the minimal set of locations that need the necessary fences (or other instructions).  Then in lowering, the lowering to assembly can better take advantage of the specific constraints required.  Overall, the performance is better than the C++11 model on ARMv7, Power, and comparable on ARMv8.  I suspect that x86's TSO model is not as interesting for finding performance benefits.

Usable / practical - Can Seqlocks Get Along With Programming Language Memory Models?  argues that C++11 would require acquire semantics on unlock.  Here it is stated that RMC is much more straightforward.  Further, students in 15-418 found gains from RMC versus the C11 model.

Other future items include the exploration of whether there are additional consistency instructions that might provide a better nuance for the compiler to inform hardware about required orderings.  Recall that the coarsest grained instruction is the full memory fence.

Thursday, July 13, 2017

PhD Defense - Automated Data-Driven Hint Generation for Learning Programming

Kelly Rivers defended her PhD work this afternoon.  She will returning to CMU this fall as a teaching professor.

Student enrollment is increasing, so more work is needed to automate the support, as TAs / instructors are not scaling.  Prior work (The Hint Factory) developed models based on prior student submissions, and then a current student's work can be found within the model thus providing suggestions for how to proceed.  However, programming may not fit within this model due to the larger and more varied space for which students can solve the problems.

First, student code proceeds through a series of canonicalization steps - AST, anonymized, simplification.  Such that the following python code is transformed:

import string
def any_lowercase(s):
  lst = [string.ascii_lowercase]
  for elem in s:
    if (elem in lst) == True:
      return True
    return False


import string
def any_lowercase(p0):
  for v1 in p0:
    return (v1 in string.ascii_lowercase)

Studies then went over 41 different problems with hundreds of correct solutions and thousands of incorrect solutions.  The model can then generate the edits and chain these hints as necessary.  In more than 99.9% of cases, the model could successfully generate a hint chain to reach a correct solution.

To further test this model and approach, the model started with the empty space (just teacher solution) and was compared against the final model.  Ideally, the final model will propose fewer edits than the initial model.  And for 56% of problems, this was true.  40% of problems were already optimal.  And 3% are opportunities for improvement to the model.

Next, given this model exists, how do the hints impact student learning?  Select half of the students to give them access to the hint model optionally.  Using a pre / post assessment, the measurement was a wash.  Instead, a second study was designed that required the students to use the system within a two hour OLI module.  Hints would be provided with every submission and either before or after the midtest in the OLI module.  Only 1/2 of the students actually proceeded through the module in order.  However, most learning was just within the pretest->practice->midtest, so adding those students increased the population.  The results show that the hints reduce the time required to learn the equal amount.

From interviews with students, students need and want targeted help on their work.  However, the hints generated thus far were not always useful.  Proposed another study based on different styles of hints: location, next-step, structure, and solution.  This study found that participants with lower expertise wanted more detailed hints.  Hint usage would sometimes be for what is wrong versus how to solve it.  And often, students know what to do, and just need to reference (via example / prior work) how to do this, rather than hinting what to do.

Monday, July 10, 2017

Wrote my own Calling Convention

I have been listening to the Dungeon Hacks audiobook, and it has both reminded of my past joys of playing Angband, as well as some interesting little "hacks" I did in high school.

In high school, I wrote many programs on my TI-83, often to help me in other classes, which lead to an interesting conversation:
Student: "Teacher, Brian has written programs on his calculator for this class."
Teacher: calls me forward "Brian, is this true?"
Me: "Yes."
Teacher: "Did you write them yourself?"
Me: "Yes."
Teacher: "I do not see what the problem is."

And besides writing useful programs, I also wrote games.  All in the TI-83's BASIC.  However, the TI-83 only had 24kB of space for programs and data.  And eventually I started exceeding this limit.  So I started finding interesting hacks that would reduce the space of programs, such as the trailing " of a string is not required if that string ends the line.  Each would save 1 byte, but it started to add up.

Now, the TI-BASIC on the 83 was very primitive.  Particularly it had no GOSUB instruction, only GOTO.  So you cannot write functions / subroutines and reuse code, which would both improve the design and reduce the space required.  Now I already knew the basics (no pun intended) of C, so I knew that code should be able to call other functions and get values back.  But TI-BASIC would let you call another program and then return to the calling program after the called one finished.  That's like a function call, right?

Therefore, I wrote a library program.  Variables were global, so the calling program could set specific parameters in the global variables, call the library program.  The library would use one parameter to determine which functionality to execute, update the necessary globals and then exit, thus returning.  And consequently, I had a library of common functions which sped up my development and reduced the space I needed.

And so it was only in listening to the audio book last week, did I realize that long ago I had developed a simple calling convention in high school.  And now I teach, among other things, the x86-64 Linux ABI (i.e. calling convention) to college students.

A calling convention just dictates how each register is used when calling a function.  Which ones need to be preserved, arguments, return value, etc.  And it also dictates the management of the stack.

Wednesday, May 3, 2017

PhD Defense - Meeting Tail Latency SLOs in Shared Networked Storage

Today I went to Timothy Zhu's PhD thesis defense.  His work is on achieving better sharing of data center resources to improve performance, and particularly to reduce tail latency.  He also TA'd for me last fall.

Workloads are generally bursty, and their characteristics are different.  Furthermore, they may have service level objectives (SLOs), and the system needs to meet these different objectives.  And the system contains a variety of resources that must be shared in some form.  It is not sufficient to just divide the bandwidth.  Nor can the system measure the latency and try reacting, particularly as bursty workloads do not give sufficient time to react.  While each workload has deadlines, it would be too complex to tag request packets with the deadlines for queuing and routing.  However, the deadlines can be used to generate priorities for requests.

The system is architected to have storage and network enforcement components to ensure QoS.  There is also a controller that receives an initial trace to characterize each workload, and that workload's SLOs.  The controller works through a sequence of analyses to successfully place each workload into the overall system.

Effectively, each workload is assigned a "bucket" of tokens, where the bucket size provides the ability to handle bursts and the rate that tokens are added covers the request rate for the workload.  Shorter burstier workloads receive large buckets and low rates, while constant workloads with little bursts have high rates and small buckets.  In both cases, only when the bucket is empty, is the workload rate-limited in its requests, and these requests receive the lowest priority.  Deterministic Network Calculus (DNC) to model the worst-case queue scenarios.  This plots two curves: the requesting flow and the service curve, both plotted as tokens by function of window size (dt).  The maximum distance between the curves is the maximum latency.

Using three traces: DisplayAds, MSN, and LiveMaps, they tested three approaches: Cake (reactive approach), earliest deadline first, and Timothy's scheme (PriorityMeister).  His scheme did significantly better than the others at meeting the SLOs.  However, the DNC analysis was based on achieving 100% and not the SLO's 99% (or other percentile success).  Depending on the characteristics, there can be significant differences between these guarantees.  To model the latency percentiles, Stochastic Network Calculus (SNC) can achieve this; however, the math is significantly more complex.  And the math had not previously been applied to this problem.  DNC is still better when assuming that bursts are correlated or the system is in an adversarial setting.  Reducing these assumptions (uncorrelated workloads), the SNC-based analysis permitted the system to admit 3x workloads versus the DNC analysis.

Workloads have a curve of satisfying bucket sizes and token rate pairs.  Many systems require the user to provide its rate limit.  Other systems use simple heuristics to either find the "knee of the curve" or select a rate limit as a multiple of the average rate.  However, for an individual workload, all pairs are satisfying, it is only when workloads are combined in a system do the different pairs matter.  The configurations of the set of workloads on the system can be solved for using a system of linear equations.  Therefore, when placing new workloads, the controlling architecture can find successful placements, while potentially reconfiguring the workloads assigned.

One extension would be addressing failure modes.  Currently, the system is assumed to be at degraded performance when components have failed.