Thursday, September 12, 2019

Thesis Proposal: Theoretical Foundations for Modern Multiprocessor Hardware

Naama Ben-David gave her proposal this morning on Theoretical Foundations for Modern Multiprocessor Hardware.

Is there a theoretical foundation for why exponential backoff is a good design?  Exponential backoff is a practically developed algorithm that 0.

To develop such a foundation, we need to a model of time; however, requests are asynchronous and not according to a single time source.  To address this, model time with adversarial scheduling.  Thus when performing a request, there are three sources of delay:
  • self-delay: backoff, sleep, local computation
  • system-delay: interrupts, context switches
  • contention-delay: delay caused by contention
Given this model, the adversary can, to a limited degree, decide when requests that an entity's request have passed from self-delay into the system delay can then move to contention-delay and ultimately be completed.

In BBlelloch'17, this model was applied and the work measured for different approaches.
  • With no backoff, there is omega(n3) work.
  • Exp backoff reduces to theta(n2 log n) bound on work
  • The paper also proposes a new algorithm that has high probability of O(n2)
The second phase of work is developing simple and efficient algorithms for systems that have non-volatile memory (NVRAM).  With NVRAM, on a crash or system failure, the contents in memory persist across reboot (or other restore).  This permits the system to restore the running program(s) to a finer degree than happens from auto-saves or other current techniques.  However, systems also have caches, which are not persistent.  Caches are presently managed by hardware and make decisions as to when to write contents back to memory.  Algorithms must work with the caches to ensure that results are safely in memory at selected points of execution.  There are a variety of approaches for how to select these points.

The third phase of work is modeling RDMA (remote direct memory access) systems.  Can there be a model of the different parts of such a system: memory, NIC (network interface card), and CPU?  Then explore the contention as well as possible failures in the system.

One scheme is for every processes to also be able to send messages on behalf of its shared memory neighbors, so that even if a process fails, its ability to participate in algorithms, such as consensus, is still possible.

Being a proposal, ongoing work will work on instantiations of these algorithms to measure the practical performance.

Monday, April 8, 2019

Presentation: The Quest for Energy Proportionality in Mobile & Embedded Systems

This is a summary of the presentation on "The Quest for Energy Proportionality in Mobile & Embedded Systems" by Lin Zhong.

We want mobile and other systems to be energy efficient, and particularly use energy in proportion to the intensity of the required operation.  However, processor architectures only have limited regions where these are in proportion, given certain physical and engineering constraints on the design.  ARM's big.LITTLE gives the a greater range in efficiency by placing two similar cores onto the same chip; however, it is constrained by a need to ensure the cores remain cache coherent.

The recent TI SoC boards also contained another ARM core, running the Thumb ISA for energy efficiency.  This additional core was hidden behind a TI driver (originally to support MP3 playing), but was recently exposed, so allowing further design to utilize it as part of computation.  But this core is not cache coherent with the other, main core on the board.

So Linux was extended to be deployed onto both cores (compiled for the different ISAs), while maintaining the data structures, etc in the common, shared memory space.  Then the application can run and migrate between the cores, based on application hints as to the required intensity of operations.  With migration, one of the core domains is put to sleep and releases the memory to the other core.  This design avoids synchronization between the two domains, which simplifies the code and the concurrency demands are low in the mobile space.  And here was a rare demonstration of software-managed cache coherence.

Therefore, DVFS provides about a 4x change in power, then big.LITTLE has another 5x.  The hidden Thumb core supports an additional 10x reduction in power for those low intensity tasks, such as mobile sensing.  Thus together, this covers a significant part of the energy / computation space.

However, this does not cover the entire space of computation.  At the lowest space, there is still an energy intensive ADC component (analog digital conversion).  This component is the equivalent of tens of thousands of gates.  However, for many computations, they could be pushed into the analog space, which saves on power by computing a simpler result for digital consumption and that the computation can be performed on lower quality input (tolerating noise), which reduces the energy demand.

Saturday, March 23, 2019

Repost: Code Smells ... Is concurrency natural?

Writing parallel code is not considered easy, but it can be a natural approach to some problems for novices.  When a beginner wants something to happen twice concurrently, the reasonable thing would be to do what works once, a second time.  Instead, this may conflict with other constructs of the language, such as main() or having to create threads.  See more here.

Thursday, March 7, 2019

Talk: Concurrent Data Structures for Non-Volatile Memory

Today, Michal Friedman, gave a talk on Concurrent Data Structures for Non-Volatile Memory.

Future systems will contain non-volatile memory.  This is memory that exhibits normal DRAM characteristics, but can maintain its contents even across power failures.  In current systems, caches update memory on either evictions and flushes.  Flushes, however, impose overhead due to the memory access time and overriding the write-back nature of most caches.

Linearizability is one definition for concurrency governing the observation of the operations.  This can be extended to durable linearizability being on a durable system, such that data is flushed before global visibility (initialization), flush prior operations (dependence), and persist operations before they complete (completion).  But a further extension is required to know when a sequence of operations are complete, beyond just taking snapshots of the memory state.

Relaxed, durable, and log versions of lock-free queue that extend Michael and Scott's baseline queue implementation.  Each version provides stronger guarantees: relaxed are the existing augmented with a sync operation to snapshot state, durable preserves the data structure across failures, log identifies the specific state.  The main guarantee is that the data structure will be consistent for any set of thread crashes, which is stronger than the lock-free guarantee.

We do this by extending the prior lock-free versions that include memory flushes of key state, and that later update which see volatile state will flush that state before completing their operations.  This meets the durable linearizability.  And can be extended by also have a log of operations that are updated and maintained before the operations themselves execute.  These logs are per-thread, so as to be unordered and to be individually stateful.

The relaxed version implements sync by creating a special object that indicates a snapshot is occurring.  If other concurrent operations find this object, they take over the snapshot and continue persisting the state before completing its own operation.  Thus a snapshot does not block other operations, but still occurs at that point in the sequence of operations.

Based on performance measurements, the relaxed performs similar to the baseline implementation, while the durable and log-based implementations run slower than the relaxed but with similar performance.

Finally, TSO provides us a guarantee that the stores will reach the cache line in a desired order and not require flushing between writes.

Saturday, March 2, 2019

Conference Attendance - SIGCSE 2019 - Day 2.5

Continuing at SIGCSE, here are several more paper talks that I attended on Friday.  Most of the value at SIGCSE comes from the friendly conversations with other attendees.  From 5-11p, I was in the hotel lobby talking with faculty and students.  Discussing research ideas, telling our stories from teaching, and generally renewing friendships within the field.

On the Effect of Question Ordering on Performance and Confidence in Computer Science Examinations
On the exams, students were offered a bonus if they could predict their score by within 10%.  Does the order of questions (easy -> hard, or hard -> easy) have any impact on their estimated or actual performance on an exam.  Students overpredicted by over 10% on the exams.  As a whole, the hard to easy students did worse, but this result was not statistically significant.  A small improvement is gained for women when the exams start with the hardest problem.

I wonder about whether students were biased in their prediction based on the reward.  Ultimately, the authors gave the reward to all students regardless of the quality of their prediction.

The Relationship between Prerequisite Proficiency and Student Performance in an Upper-Division Computing Course
We have prerequisites to ensure that students are prepared for the later course, an upper-level data structures class.  Students started on average with 57% of expected prerequisite knowledge, and will finish the course with an improvement of 8% on this knowledge.  There is a correlation between prerequisite score and their final score.  With several prerequisites, some knowledge concepts has greater correlation than others.  Assembly is a surprising example of a concept that relates.  Students benefit from intervention that addresses these deficiencies early in the term.

Afterward, we discussed that this work did not explore what prerequisite knowledge weakly correlated with student learning.  How might we better understand what prerequisites actually support the learning in a course?  Furthermore, can we better understand the general background of students in the course, such as class standing or general experience?

Visualizing Classic Synchronization Problems
For three classic synchronization problems: dining philosophers, bounded producer-consumer, and readers and writers.  Each one has a window displaying the operations, as well as multiple algorithmic strategies.  With these visualizations, do students learn better and also find them more engaging than reading about the problems in the textbook.  While not statistically significant, the control group exhibited better recall, although the visualization group had higher engagement.  That said, the control group exhibited higher course grades, so the difference in learning may actually be from unrelated factors.

Friday, March 1, 2019

Conference Attendance: SIGCSE 2019 - Day 1.5

Back at SIGCSE again, this one the 50th to be held.  Much of my time is spent dashing about and renewing friendships.  That said, I made it to several sessions.  I've included at least one author and linked to their paper.

Starting on day 2, we begin with the Keynote from Mark Guzdial

"The study of computers and all the phenomena associated with them." (Perlis, Newell, and Simon, 1967).  The early uses of Computer Science were proposing its inclusion in education to support all of education (1960s).  For example, given the equation "x = x0 + v*t + 1/2 a * t^2", we can also teach it as a algorithm / program.  The program then shows the causal relation of the components.  Benefiting the learning of other fields by integrating computer science.

Do we have computing for all?  Most high school students have no access, nor do they even take the classes when they do.

Computing is a 21st century literacy.  What is the core literacy that everyone needs?  C.f. K-8 Learning Trajectories Derived from Research Literature: Sequence, Repetition, Conditionals.  Our goal is not teaching Computer Science, but rather supporting learning.

For example, let's learn about acoustics.  Mark explains the straight physics.  Then he brings up a program (in a block-based language) that can display the sound reaching the microphone.  So the learning came from the program, demonstration, and prediction.  Not from writing and understanding the code itself.  Taking data and helping build narratives.

We need to build more, try more, and innovate.  To meet our mission, "to provide a global forum for educators to discuss research and practice related to the learning and teaching of computing at all levels."

Now for the papers from day 1:

Lisa Yan - The PyramidSnapshot Challenge

The core problem is that we only view student work by the completed snapshots.  Extended Eclipse with a plugin to record every compilation, giving 130,000 snapshots from 2600 students.  Into those snapshots, they needed to develop an automated approach to classifying the intermediate snapshots.  Tried autograders and abstract syntax trees, but those could not capture the full space.  But!  The output is an image, so why not try using image classification.  Of the 138531 snapshots, they generated 27220 images.  Lisa then manually labeled 12000 of those images, into 16 labels that are effectively four milestones in development.  Then, a neural network classifier classified the images.  Plot the milestones using a spectrum of colors (blue being start, red being perfect).  Good students quickly reach the complete milestones.  Struggling students are often in early debugging stages.  Tinkering students (~73 percentile on exams) take a lot of time, but mostly spend it on later milestones.  From these, we can review assignments and whether students are in the declared milestones, or if other assignment structure is required.

For the following three papers, I served as the session chair.

Tyler Greer - On the Effects of Active Learning Environments in Computing Education

Replication study on the impact of using an active learning classroom versus traditional room.  Using the same instructor to teach the same course, but using different classrooms and lecture styles (traditional versus peer instruction).  The most significant factor was the use of active learning versus traditional, with no clear impact from the type of room used.

Yayjin Ham, Brandon Myers - Supporting Guided Inquiry with Cooperative Learning in Computer Organization

Taking a computer organization course with peer instruction and guided inquiry, can the peer instruction be traded for cooperative learning to emphasize further engagement and learning.  Exploration of a model (program, documentation), then concept invention (building an understanding), then application (apply the learned concepts to a new problem).  Reflect on the learning at the end of each "lecture".  In back-to-back semesters, measure the learning gains from this intervention, as well as survey on other secondary items (such as, engagement and peer support).  However, the students in the intervention group did worse, most of which is controlled by the prior GPA.  And across the other survey points, students in the intervention group rated lower.  The materials used are available online.

Aman, et al - POGIL in Computer Science: Faculty Motivation and Challenges

Faculty try implementing POGIL in the classroom.  Start with training, then implementing in the classroom, and continued innovation.  Faculty want to see more motivation, retaining the material, and staying in the course (as well as in the program).  Students have a mismatch between their learning and their perceived learning.  There are many challenges and concerns from faculty about the costs of adoption.

Thursday, November 29, 2018

Seminar Talk: Computer Science Pedagogy - Miranda Parker

This week I have been co-hosting Miranda Parker, from Georgia Tech, as part of our Colloquium on Computer Science Pedagogy.  Her work is titled, Barriers to Computing: What Prevents CS for All.

The question is what are the barriers to accessing Computer Science at the high school level.  Nation-wide, ~35% of high schools offer at least one CS course, although this is self-reported CS.  In Indiana, the most popular courses are taken by ~5000 students, state-wide.  In Austin Texas, about 6000 students take at least one CS course, out of ~110,000.

How do we know if students are successful (i.e., did they learn)?

For this, we need a validated assessment to ensure that we are measuring what we want to measure.  An initial assessment, FCS1, worked fairly well and across multiple introductory programming languages; however, the assessment had limits in its use, which lead to the SCS1.  This assessment correlated well with FCS1, so it can standin; however, it was an hour long.  Assessments should: cover the content, vary in difficulty, and have good discrimination (so individual scores are good predictors for the overall performance).  In analysis, most of the SCS1 questions were hard, and few provided good discrimination.  The assessment was then adapted to focus on the medium difficulty problems that discriminate (in test scores), and expanded to a 30 minute test.

With a measurement of whether students are succeeding in Computer Science, we can walk back to investigate what things influence students to succeed in Computer Science.

Among prior studying factors, we know that students do better in CS if they have more prior experience with Computer Science, and students do better in CS (as well as STEM and general education) with higher socioeconomic status (SES).  There are also known prior links between SES and access to computing (whether it is formal courses, or informally), and SES to spatial reasoning.  And both of these later components are linked to CS achievement.

In exploratory study (large, public university with mainly high SES students), showed statistical correlation from spatial reasoning to CS achievement.  There was not correlation from having access to achievement.  In the interest of time, this point was not presented further.

Barriers to computing

There are three main sources of access to Computer Science courses: state policies, geography (such as, circumstances, industry, or neighboring schools with CS), and resources (such as, time and money or rural versus urban).

In partnership with Georgia Department of Education, CS enrollment data from 2012-2016, plus other public data (such as, characteristics of the county and of the school), for each of the 181 school districts in Georgia, where each district has at least one high school.  Using the definition of CS courses, being those that count toward the graduation requirement in Georgia as computer science, versus other computing courses, such as web design or proficiency with office.  In Georgia, out of 500,000 students, about 6000 took a CS course in a given year, where about 50% of schools offered computer science during at least one year in that time frame.  And the 6000 students are actually student course events, where a student taking two CS courses would count twice.

For those schools, CS enrollment, total high school enrollment, and median income contribute to whether CS will be offered in the next year.

This work is yet ongoing, where the next steps are to visit schools and collect further data on these factors, such as why a school discontinued a course offering, or now offers one.  Or what do students these courses go on to do?  An audience question wondered whether the CS courses offered relates to the courses offered of other parts of STEM.