Monday, August 24, 2020

The Martian, Computer Science, and College

This summer, the school asked professors if they would be interested in leading book discussions with incoming first-year students in Computer Science.  I, along with many other professors, volunteered, where each of us could select our specific title to discuss.  I proposed reading, The Martian, by Andy Weir.  What follows is not a review of the book, which I really enjoyed, but rather a summary of the discussion points from the hour we had together.

The following text contains many book spoilers.

We started the discussion with a short summary of my background and then a student asked about the Martian rover hacking.  It is in my opinion, plausible.  It depends on several assumptions, such as the rover's driver being able to be modified so easily to log the malformed network data (sent by the probe).  It would then be reasonable to send commands to the probe to broadcast the necessary data to construct an executable on the rover.  Then assuming that Mark can run it with sufficient privileges or that there is a known vulnerability allowing the executable to gain the privilege, the probe's data could be a patch.  Personally, I enjoyed the thought of using ASCII to communicate and myself and my TAs agree, "man ascii".  Besides, I carry an ASCII chart in my wallet.

We discussed how there was significant cooperation in solving the problems.  The crew worked together.  NASA had many teams working on the problems.  Internationally, China also provided assistance.  The people working on these problems were diverse.  And there were continual concerns about the crew's mental state and about Mark's.  Similarly, Computer Science students will need to learn to work with others, to work in groups with diverse backgrounds and skill sets, and know that there are many many more people that are wanting to see and willing to support them in having successful lives and taking steps whenever circumstances dictate.

Mark Watney survives in part by having a diverse education and training.  Being a fictional character, he has the right skills to survive, but this is based in reality.  Astronauts are trained in a diverse set of skills, particularly to maximize the value gained from their time in space.  They are not experts, but rather trained well to exercise the guidance of experts on Earth.  And similarly, I reinforced to the students that their studies should work to give them a broad foundation beyond Computer Science.

The final topic brought up by the students was about ethics.  First, should NASA tell the crew of the Hermes that Mark Watney was alive on Mars, when it was first determined.  Or instead, NASA would censor all communication to ensure that they were not informed that they abandoned Mark.  What is the trade-off between the truth and mission results?  Second, the Chinese scientists had to make a decision, is the life of one astronaut worth their probe?  Should they give up their long-prepared mission of great scientific value to instead make a "grocery delivery"?  How much is one life worth?  Third, when the Rich Purnell plan presented an alternative to rescuing Mark, was NASA obligated to consult the crew in evaluating this option?  And related, the crew of the Hermes decided to return to Mars (on the low chance of killing everyone plan) to save Mark Watney, and also extending their mission duration.  Also briefly discussed was that governments also have to decide how much a life is worth.  It is noted that the science that Mark can perform makes up for the cost of his rescue, which addresses this concern in story.

I think that the 20 or so students appreciated the hour we had together.  I hope to some day be able to meet and ultimately teach them in person.

Wednesday, April 22, 2020

Thesis Proposal - Lightweight Preemptable Functions

Sol Boucher presented his thesis proposal today via Zoom.  It was well attended and I think many of us are already looking forward to his defense.

Lightweight Preemptable Functions (LPF)
Function calls require some knowledge of the cost for a program to know whether it can invoke it or not in other time critical execution.

In the current space of multitasking, three approaches will be first reviewed: futures, threads, and processes.  Futures rely on the runtime to provide asynchronous execution, but preemption requires yield points, and not all execution can support these points, nor can it be clear what this future will do.  With threads, execution is fairly independent, but minimal information is provided toward scheduling and cancellation is hard.  Processes provide good support for cancellation, except that on fork(), only one thread is required to be carried over in the new process.  The other threads are canceled, which can result in inconsistent state.

The thesis proposal is: Introducing a novel abstraction for preemption at the granularity of a synchronous function call, which includes a timeout.

launch(function, timeout, argument) returning a continuation.  If the timeout elapses, the continuation is returned, but it is then the choice of the caller for whether to resume it or cancel.  This LPF executes within the same thread context as the caller, thereby reducing overhead.  However, to resume the LPF, it will need a new stack.  To support the timeout, it relies on a timer signal that can occur every ~5 microseconds.  Launch / resume have overhead comparable to this, significantly better than fork or pthread_create.  However, cancel is extremely expensive.

LPFs also have an issue with calling functions that are non-reentrant, similar to the rules governing signal handlers.  To address this, the runtime provides selective relinking to capture what the LPF is calling via the global offset table (GOT).  Some GOT entries point to dynamic libraries, other entries are initially pointing to the dynamic linker.  This runtime support also needs to intercept thread local variables.  This interception support imposes about 10ns of  overhead, which is little above the cost of function calls themselves.

Microservice approaches have significant latency, often tends to hundreds of microseconds.  Primarily the requirement to create a sufficient container, often via processes or virtualization.  If the microservice was instead written safely and using LPFs, then the latency could be reduced toward the hardware bound as measured by communicating between VMs or committing transactions.

Cancellation cleanup is difficult in languages, such as C, that require explicit cleanup.  In other languages, adding a new exception path for timeout and cancellation could then invoke the necessary destructors.  Nonetheless, this can be expensive (perhaps single milliseconds).

Other possible future work:
Another cancellation step is the cost of unloading the mirrored library, so could the runtime instead track the changes made and then determine whether to roll back or discard.
Is it possible to reduce the overhead of the preemption signals or improving their granularity.