Thursday, November 29, 2018

Seminar Talk: Computer Science Pedagogy - Miranda Parker

This week I have been co-hosting Miranda Parker, from Georgia Tech, as part of our Colloquium on Computer Science Pedagogy.  Her work is titled, Barriers to Computing: What Prevents CS for All.

The question is what are the barriers to accessing Computer Science at the high school level.  Nation-wide, ~35% of high schools offer at least one CS course, although this is self-reported CS.  In Indiana, the most popular courses are taken by ~5000 students, state-wide.  In Austin Texas, about 6000 students take at least one CS course, out of ~110,000.

How do we know if students are successful (i.e., did they learn)?

For this, we need a validated assessment to ensure that we are measuring what we want to measure.  An initial assessment, FCS1, worked fairly well and across multiple introductory programming languages; however, the assessment had limits in its use, which lead to the SCS1.  This assessment correlated well with FCS1, so it can standin; however, it was an hour long.  Assessments should: cover the content, vary in difficulty, and have good discrimination (so individual scores are good predictors for the overall performance).  In analysis, most of the SCS1 questions were hard, and few provided good discrimination.  The assessment was then adapted to focus on the medium difficulty problems that discriminate (in test scores), and expanded to a 30 minute test.

With a measurement of whether students are succeeding in Computer Science, we can walk back to investigate what things influence students to succeed in Computer Science.

Among prior studying factors, we know that students do better in CS if they have more prior experience with Computer Science, and students do better in CS (as well as STEM and general education) with higher socioeconomic status (SES).  There are also known prior links between SES and access to computing (whether it is formal courses, or informally), and SES to spatial reasoning.  And both of these later components are linked to CS achievement.

In exploratory study (large, public university with mainly high SES students), showed statistical correlation from spatial reasoning to CS achievement.  There was not correlation from having access to achievement.  In the interest of time, this point was not presented further.

Barriers to computing

There are three main sources of access to Computer Science courses: state policies, geography (such as, circumstances, industry, or neighboring schools with CS), and resources (such as, time and money or rural versus urban).

In partnership with Georgia Department of Education, CS enrollment data from 2012-2016, plus other public data (such as, characteristics of the county and of the school), for each of the 181 school districts in Georgia, where each district has at least one high school.  Using the definition of CS courses, being those that count toward the graduation requirement in Georgia as computer science, versus other computing courses, such as web design or proficiency with office.  In Georgia, out of 500,000 students, about 6000 took a CS course in a given year, where about 50% of schools offered computer science during at least one year in that time frame.  And the 6000 students are actually student course events, where a student taking two CS courses would count twice.

For those schools, CS enrollment, total high school enrollment, and median income contribute to whether CS will be offered in the next year.

This work is yet ongoing, where the next steps are to visit schools and collect further data on these factors, such as why a school discontinued a course offering, or now offers one.  Or what do students these courses go on to do?  An audience question wondered whether the CS courses offered relates to the courses offered of other parts of STEM.

Wednesday, October 17, 2018

Thesis Defense: Practical Concurrency Testing

Ben Blum defended his dissertation work today on Practical Concurrency Testing.  What follows are the notes from that defense.

To prove that a program is correct across arbitrary concurrency.  There are three testing approaches:
unit testing of the most likely, stress testing that is not systematic, and verification that requires separate tools and techniques to describe.

Landslide is a proposed technique that is based on Stateless Model Checking (Godefroid '97), which tests a different execution interleaving on every iteration.  However, the naive interleaving provides O(2^n) states to test.  [Flanagan '05] identified equivalent interleavings and [Musuvathi '08] proposed heuristic orderings to identify the possible bugs faster.  This approach can often require annotations, so adoption requires automated instrumentation.  This space is addressing further concurrency problems such as weak memory models, but hardware transactional memory is still open.

This instrumentation requires preemption points.  Finer-grained finds more bugs, but increases the states to test.  Bugs / failures follow certain cases, such as use-after-free, deadlocks, assertion failures, and invalid memory accesses.  Dynamic data-race analysis can help inform the necessary preemption points.

As a reminder, a data race:
- one or more accesses is write
- threads are not holding the same mutex
- Nor is there other ordering requirements (condition variable, etc)

Quicksand applies this analysis to select different smaller problem spaces using subsets of possible preemption points.  Each subset also represents smaller parts of the larger possible problem space.  If these subsets are all satisfied, then represents a full verification of the program.  Prior work explored using APIs such as mutex_lock/unlock, or using every shared variable access as preemption points.

This tester is deployed in OS courses at CMU, PSU, and U Chicago.  Manual annotation is not viable for students, especially those struggling for whom the traces would be valuable.  That said, students regularly deploy ad-hoc synchronization, such as while (!ready) yield();, requires heuristics as the naive model checking must test every possible count of yielding and its interleaving.

When used by students, about 75% of tested kernels / libraries have identifiable bugs from the testing framework.  For the tested submissions (7 semesters) of students at CMU, there is an improvement in grades, but it is not statistically significant when correcting for the opt-in bias.  Most students are then able to fix their bugs found by the tool.

Hardware transactional memory poses a separate challenge for model checking.  Aborted transactions are observationally equivalent to an immediately failed transaction.  Furthermore, all transactions must be assumed to abortable, as there are many possible causes of aborts.  As prior posts covered, this fact requires that any transaction have a valid abort path.  And this abort path requires most of the verification.

Testing Landslide using hand-written tests, transactional data structures, and a TSX-based spinlock.  Each set of tests has a concurrency or performance bug in the implementations.  What about demonstrating that there are no bugs in implementation?  With 10 hours of CPU time, verification is only possible for small cases on complex code.  That said, practical testing so far only requires <4 preemptions to create the buggy scenario.  There can be other bugs requiring an increasingly complex ordering, but generally those are very rare.

Abstraction reduction [Simsa '13], works to reduce primitives within implementations to verified components, such as mutual exclusion, etc.  Using this technique then allows Landslide to verify the complex HTM implementations at higher thread counts.

In attendance are the recent instructors of Operating Systems and the TAs.


Wednesday, October 10, 2018

NASA Talk: Making Miracles Happen

Dr. Thomas H. Zurbuchen the Associate Administrator of NASA's Science Mission Directorate gave a talk today about some key lessons he has learned and observed from his years with NASA, as well as years before.  The following is my quick notes from his talk.

Planning for the future, Voyager launched with the plan to visit Jupiter and Saturn.  But the scientists still had plans for the possibility to visit Uranus and Neptune based on that launch window.  And also, that in the lifetime of the probes, they may leave the Heliosphere.  Imagine that over 40 years after launch, there are still papers to be written in Nature or Science from the mission.  Voyager 2 is nearing this mark, which Voyager 1 already crossed.

Solar probe problem was to explore near to the Sun.  The desire was there from 60 years ago.  And eventually from decades of other work, the technology was there to support components such as solar panels that can operate close to Sun.  Once the technology was available, the proposal was based on a $4 billion probe, which was outside the budget.  But NASA offered to design / launch for $1 billion.  This forced a complete redesign.  The original mission used a single Jupiter flyby to bring the probe to 4 solar radii.  For $1 billion, the flight path would instead use Venus to do repeated flybys and eventually lower the perihelion to 9 solar radii.  While 9 > 4, the Venus flight path provides many flybys of the Sun, which provides a richer dataset from each flyby.  This probe is the Parker Solar Probe, also exceptionally named for the still living scientist Eugene Parker.

Cassini reinforced the need for teams to have diverse backgrounds and training.  That greater success is possible from having the great teams.

Hubble provided the research, just last year alone, for ~1000 publications.  On the initial launch, there was the well known flaw in the mirror, when NASA had been predicting great images.  After studying the results, a tech worked out that the mirror was slightly out of position, and by a small shift of the sensor, everything worked.

Shortly after finishing his PhD, Dr. Z came up with a new sensor in '98 to be possibly part of Messenger, at its launch in 2004, which was multiple times added and removed from the craft.  And even after the launch, it took Messenger 7 years to complete the orbital path and remove sufficient energy so that the probe could enter orbit of Mercury.  This requires patience.

When working in teams, he tells about being the Swiss army.  Barbed wire had to be placed around the camp.  This was the job given to those in trouble.  But he talked with them on this duty and helped.  So eventually, rather than just being the trouble job, the team was a good team, and some soldiers wanted to work on those teams.  Better to be on a good team doing a bad task, than a bad team doing a good task.

The National Academy of Science sets the science priorities (this process from priority to mission I read significantly about in Chasing New Horizons), but ultimately the specific mission to meet the science priority is decided by Dr. Z.  Then that mission moves through multiple steps and reviews.  And one of the key steps is that while Dr. Z makes the decision, these decisions are based on the good advice from the experts.

For the missions flown, Dr. Z has about 5 failures for every success.  These failures are things like being removed from the program or the mission being canceled.  And sometimes it is for technical failures of the device or the probe.  Things will fail.  At every launch, he goes there to watch and has a large mission plan.  Most of that mission plan covers what to do if things go wrong.  Plan for the failure.

Friday, September 14, 2018

Is C low level?

A recent ACM article, C is Not a Low-Level Language, argues that for all of our impressions that C is close to hardware, it is not actually "low-level".  The argument is as follows, C was written for the PDP-11 architecture and at the time, it was a low-level language.  As architectures have evolved, C's machine model has diverged from hardware, which has forced processor design to add new features to attain good performance with C, such as superscalar for ILP and extensive branch prediction. 
Processors must also maintain caches to hide the memory latency, which require significant logic to maintain coherence and the illusion that the memory is shared between the threads of a process.  Furthermore, the compiler is also called upon to find optimization opportunities that may be unsound and definitely require programmer years to implement.

The author repeatedly contrasts with GPUs, while noting that they require very specific problems, or "at the expense of requiring explicitly parallel programs".  If you were not keeping track, a GPU requires thousands of threads to match the CPU's performance. The author calls for, "A processor designed purely for speed, not for a compromise between speed and C support, would likely support large numbers of threads, have wide vector units, and have a much simpler memory model."  Which generally sounds like the GPU design.

I appreciate the callouts to C's shortcomings, which it certainly has.  The notion that C has driven processor design is odd, yet it does reflect the fact that processors are designed to run current programs fast.  And with the programs being written in either C or a language built on C, that forces many programs into particular patterns.  I even spent some time in my PhD studies considering a version of this problem: how do you design a new "widget" for the architecture if no programs are designed for widgets to be available?

All to say, I think C is a low-level language, and while its distance from hardware may be growing, there is nothing else beneath it.  This is a gap that needs to be addressed, and by a language that has explicit parallel support.

Wednesday, August 29, 2018

Thesis Defense: Responsive Parallel Computation

We can consider parallelism strategies as either competitive (such as pthreads) or cooperative (such as Cilk).  Cooperative systems assume that the parallel computation is being equally distributed across the threads, such that other tasks are not prioritized.

Into this, Stefan Muller presented a Responsive Parallel Computation approach where the two models can be unified.  Or put another way, "We can extend existing cooperative language and cost models to account for competitive threading constructs, and to design and implement efficient and responsive scheduling algorithms."

Cooperative systems, such as Cilk, rely on task queues and work stealing to implement the parallelism, but when IO is involved, the common implementations use blocking system calls.  In the cooperative system, this blocking call then blocks one of the worker threads, rather than just the working task.

For operating system scheduling, there are many threads within the myriad of processes in the system that may need to run.  The OS often uses a priority associated with any thread / process to determine these scheduling decisions.  When programmers assign priorities (if they do at all), they often are trying to express a relative ordering between different tasks rather than absolute priorities.  As some systems limit priorities to a handful to as many as 100, the mapping of priority classes to numeric levels can be unclear.

Propose an extension to ML, PriML, with two new types:
t cmd[p] - command p returning t
t thread[p] - thread at priority p returning t

priority and order declarations enable defining the priority and the specifying the partial order between those priorities.  spawn[priority] {expression} to create the thread.  sync to retrieve the result.  Or poll to return an option that can be matched, so that we can select from multiple threads.  For example, the following PriML sequence spawns two threads to sort emails with different methods and selects the faster one.

t1 <- spawn[p] {do (qsort date emails)};
t2 <- spawn[p] {do (mergesort date emails)};
do (let fun choose () =
  cmd[p]
  {
    v1 <- poll t1;
    v2 <- poll t2;
    do (case (v1, v2) of
      (SOME l, _) =>
        cmd[p] {cancel t2; ret l}
      | (_, SOME l) =>
        cmd[p] {cancel t1; ret l}
      | (NONE, NONE) => choose ())
    }
  in
    choose ()
  end)

These features then allow the language to enforce the priority constraints, and avoid any priority inversion, as sync must always be to an equal or higher priority, and poll can be to any.

Returning to parallel program analysis, we represent them with DAGs; however, the DAG must be extended with priority annotations and the work / span calculations.  This analysis can explore different schedules, such that adding fairness for all thread priorities would extend the bound on time.

Monday, August 27, 2018

Book Review: Valley of Genius

Valley of Genius is a history of Silicon Valley based on the people who made it.  Since the work was based on interviews, I expected that it would read as actual interviews, where the dialog exists between the author and the "genius".  Instead, the author was removed from the chapters and instead the entire text consisted of the different participants being quoted.  This writing style took sometime to accept.  Initially, I wanted to know exactly who each person was and their role in the narration, which given the numbers involved would significantly detract from the story being told.  Then I stopped bothering with the names, only looking for them when two (or more) speakers refer to each other.  And this aspect is the best of the book, to have the different individuals having a debate in the dialog, otherwise each chapter is just narration.  That said, the concern is that there is lost context to the quotes, as the author has explicitly stated the interviews had been spliced together.

All in all I enjoyed the book, but that only merits 3.5/5 stars.

(A free copy of the book was provided through Goodreads.)

Friday, August 17, 2018

Repost: CRA Memo on Best Practices for Engaging Teaching Faculty in Research Computing Departments

Mark Guzdial noted today that the CRA has prepared its memo about teaching faculty in research departments.  For the past two years, I have been going to a CRA event at SIGCSE geared toward preparing this memo, so good to see its release.  I am thankful that at my institution, we have most of the things outlined in the memo and are treated roughly as equals to the tenure-track faculty.

Tuesday, July 31, 2018

Book Review: The Art of Application Performance Testing

The Art of Application Performance Testing, covers what it says.  The book starts with concepts general to any performance testing, which was interesting to me.  Most of the text focuses though on the Application part of the title.  The applications here are primarily web-based, or other client-server based setups, and not just the generic "application" referring to any program.  That said, I do not work on such applications, so the remainder of the text was of less value to me.

In testing applications, a performance analyst needs to establish a representative workload, which includes the actions to perform, and the combined load.  For example, most users logging in to their bank will view their account balance, while others might transfer money or pay a bill.  Combined these actions might represent most of the work from users.  Then for each unit of server, how many users should be able to perform a mix of those actions, which forms the load.

After establishing the workload, the analyst needs to implement the described workload, which requires a tool that generates the load (either by driving the application itself or replaying a synthetic trace of the load).  For those tools, what additional hardware is required to deploy this load?  Does the deployment take into account geographic and other user variations (so that the load generation is representative of the user base)?  Finally, what tooling and methodology exists for profiling and recording the execution of the workload for present and future analysis?

So I appreciated the content of the book and would recommend it to individuals focusing on testing of user-facing applications.

Wednesday, May 16, 2018

Review: Lessons from Building Static Analysis Tools at Google

The Communications of the ACM recently had several development articles, and I found the one on static analysis tools at Google particularly interesting.  The article works through how Google went about integrating static analysis tools into every developer's workflow.  And the tools have to be in the workflow, or developers will "forget" to use them.  The second problem with the tools is ensuring that the feedback is useful.  Currently, each dev will mark the items as either useful or incorrect.  If a tool exceeds a 10% false-positive rate, it is temporarily disabled until that tool's developers can fix the flagged issues.  The third issue with the tools is that some are expensive.  Depending on the type of static analysis, the time required may be significant.  Thus the tools are classified into two camps: on each compile, or on each code review / commit.  It is also important that some tools can be temporarily disabled, such that during debugging or refactoring the code may temporarily mutate into an "unsafe" state to simplify the process.

Personally, I am glad that they are integrating analysis tools into the development workflow.  Much work has been done to find bugs and issues within source code, so it is good that these analyses can be utilized regularly to improve code quality.

(As a note, I do not nor never have worked for Google, so I can only write based on the ACM article and not personal experience.)

Wednesday, May 2, 2018

Performance of Atomic Operations on NUMA Systems

It is the end of the semester, so time for posters about student projects.  I visited two sessions so far with three more to go.  I specifically wanted to highlight the results from one poster.

The pair of students wrote a microbenchmark around compare-and-swap, where the value is read, a local update is computed and then compare-and-swap attempts to place the new value into memory iff the old value is present, otherwise fail and retry.  Running the code in tight loop with a thread per hardware context, there is clearly going to be significant contention.  In this scenario, they had two observations from the results:
  1. If the requesting thread is located on the same node as the memory, it will almost always fail.  Implying that accessing NUMA local memory takes a different path than NUMA remote, thereby exhibiting worse performance on contended atomic operations.
  2. The Intel processors had a higher success rate as neighboring threads were more likely to pass along access between each other.  The AMD system did not exhibit this behavior.
Caveats: The precise NUMA topology was not known.  And the AMD processors were several generations older than the Intel processors.

Friday, April 27, 2018

Thesis Defense: Systems Support for Intermittent Computing

Today I attended Alexei Colin's thesis defense titled, Systems Support for Intermittent Computing.

For small, embedded devices, batteries are expensive / difficult, so energy can be harvested from RF, light, temp gradients, motion, et cetera.  In such a device, this direct energy source is insufficient to power the device, so a small capacitor (or other storage medium) retains this energy until the device can be powered for a short time.  The discharge provides an intermittent period of execution before the power source drops below the threshold for execution.  Programs can be annotated with latches or other progress points, such that execution after power failure can then resume at this point after the power is again available.

To model the computation, the program will be decomposed into tasks, where each task can only transfer control to other tasks, but contains arbitrary code.  Tasks will communicate through channels.  The channels provide the memory model, such that any internal updates within the task are ultimately exposed via the channels.  However, this model while reducing the overhead required to execute the tasks, requires a greater quantity of the non-volatile memory.

How do we then get the tasks and thus the latches?  Given a control flow graph (CFG), task boundaries will need to be inserted between specific basic blocks of the graph.  The compiler can be extended to model (or receive model results) of the energy requirements for each block and thereby estimate which path segments will have sufficient energy for complete execution.  Each block actually has not a single energy, but a PDF based on possible microarchitectural effects.  Then the model combines these PDFs to compute the CDF to determine the probability that a given path will successfully execute given a specific amount of energy available.  Note, each task boundary imposes overhead both in time and therefore energy, so we want the set of task boundaries to minimize overhead, while also accounting for task failures wasting energy.  This compiler pass produces better task decompositions than are achieved via manual programmer annotations, as provided by prior work.

Other system support issues.  This system should also have dynamic ability to select the stored energy necessary for task execution.  This change first requires splitting the energy storage device into multiple banks in hardware.  Also, debugging issues in the system is difficult, particularly where the device is expecting to regularly "fail", so a new debugger was prepared that can combine the program state of traditional debuggers, while still supporting the device to be intermittent.  Such devices will also need further design for intermittent networking stacks, and then be built into a larger IoT hierarchy.

In conclusion, energy-harvesting embedded computers will form the edge of the IoT hierarchy.  And the system stack will form the basis for support.

Tuesday, February 27, 2018

Conference Attendance SIGCSE 2018

I have just finished attending SIGCSE 2018 in Baltimore.  In contrast to my earlier conference attendance, this time I have had higher involvement in its execution.

On Wednesday I went to the New Educator's Workshop (NEW).  Even being faculty for two years, there was still a number of things that were either new or good reminders.  Such as including or discussing learning objectives with each lecture and assignment, or being careful with increasing one's level of service.  As a new faculty member, each service request seems exciting, as no one has asked me before!  But many senior faculty emphasized that this is the time in which they are protecting us from lots of service opportunities such that we can spend time on our teaching and research.

On Thursday morning, I presented my recent work that updated a programming assignment in Introduction to Computer Systems, and from which we saw improvements in student exam scores.  We did not research the specific action, and are therefore left with two theories.  First, the improvement could be from using better style in the starter code and emphasizing this style in submissions.  Second, we redesigned the traces to require submissions to address different cases and thereby implement different features.  I lean toward the formed, but have no data driven basis for this hypothesis.

Let's discuss active learning briefly.  I attended (or ran) several sessions focused on this class of techniques.  The basic idea is that students have better engagement and learning by actively participating in class.  There are a variety of techniques that work to help increase student activity.  On Thursday afternoon, Sat Garcia of USD, presented Improving Classroom Preparedness Using Guided Practice, which showed how student learning improved from participating in Peer Instruction, which particularly requires students to come to class prepared.  Shortly later, Cynthia Taylor joined Sat and I in organizing a Bird of Feather (BoF) session on using Active-learning in Systems Courses.  We had about 30-40 attendees there split into two groups discussing some techniques they have used and problems they have observed.  5 years ago, a similar BoF had attendance around 15-20, so we are making progress as a field.

On Friday, I spoke with Brandon Myers who has done work on using POGIL in Computer Organization and Architecture.  In POGIL, students are working in groups of 3-4 with specific roles through a guided learning, guiding students into discovering the concepts themselves.  We had a nice conversation and may be merging our draft resources.  This last point is often the tricky part of using active learning in that developing reasonable materials can be both time intensive and requires several iterations.

The Friday morning keynote presentation was given by Tim Bell, who spoke about K-12.  This topic is rather distant from my own work and research, so I was skeptical.  Yet, I came out quite enthused.  It was interesting to think about presenting Computer Science concepts in non-traditional ways, based initially on having to explain your field at elementary school when the other presenters are a cop and a nurse (his example).  How could you get 6 year olds to sort?  Or see the advantage of binary search as the data grows?

In the afternoon, I was a session chair for the first time.  I moderated the session on Errors, so obviously the AV system stopped working for a short duration.  Beyond that incident, the session seemed to go well.

I always like going to SIGCSE.  It is rejuvenating and exhausting.  So many teachers to speak with about courses, curriculum, and other related topics.  And then you find that you've been social for 16 hours or so hours.

Friday, January 19, 2018

The Importance of Debugging

How do you teach students about debugging?  To have The Debugging Mind-Set?  Can they reason about possible causes of incorrect behavior?

For the past year, I have been revising material to help students learn about using gdb to assist in debugging, which is an improvement over the "printf-based" methods previously.  And while this approach is usually used when the program has crashed from a segfault, many students are stymied when the problem is incorrect behavior rather than invalid behavior.

When their program crashes, they usually appreciate that gdb can show them what line of code / assembly has crashed.  But how can a student "debug" incorrect behavior?  Many try the "instructor" debugging method (they try this too when the code is crashing), where they either present their code or describe the basics of what they have done and ask us, as an oracle, what is wrong.  I try to offer questions that they need to answer about their code.  Sometimes the student follows well and this is valuable guidance for him or her to solve the behavior issue.

Other times I have asked these questions, trying to build up a set of hypotheses to test and investigate, and the student effectively rejects them.  Not for being wrong, but for not clearly being the answer.  They have the subconscious idea that their code is failing for reason X, which was their intuitive guess (these guesses are a good start).  But the idea that they just do not know enough and need to collect more data is not grasped.

You are a doctor when debugging.  Sometimes the patient gives clear symptoms.  And other times, you need to run more tests.  Again, thankfully, usually if an instructor recommends running a certain test, the data gleaned is enough to guide them through the diagnosis.  Students appreciate when this happens; however, there is creativity in considering other possibilities and sometimes that possibility requires being open to everything (see TNG Finale).

This semester I am co-teaching Operating Systems.  We have told the students that you have to know how to debug, as sometimes printf is what you have to debug.  And other times, to quote James Mickens, "I HAVE NO TOOLS BECAUSE I’VE DESTROYED MY TOOLS WITH MY TOOLS."  So in the dystopian, real-world, you need all the debugging tools and techniques you can get.