Elegant C: conference

Showing posts with label conference. Show all posts

Wednesday, October 16, 2019

Conference Attendance - MICRO 52 - Day 2/3

This is a rough writing of the notes from the other two keynotes.

Keynote Bill Dally on Domain-Specific Accelerators

Moore's Law is over. Sequential performance is increasing at 3% per year. And cost per transistor is steady or increasing.

Most of power is spent moving data around, so simple ISAs such as RISC are actually inefficient power-wise versus specialized operations. With special data types and operations, the hardware can be designed so that something taking 10s to 100s of cycles is done in 1. Memory bandwidth can bottleneck, as "bits are bits".

Genome matching, via Smith-Waterman algorithm, can be done in single cycle for many bases (10), while a CPU would be 35 ALU ops and 15 load/store. And the specialized hardware is 3.1pJ (10% is memory) and the CPU is 81nJ.

Communication is expensive in power, so be small, be local. 5pJ/word for L1, 50pJ/word for LLC, and 640pJ/word for DRAM. And most of this power is driving the wires.

Conventionally, sparse matrices need <1% set bits to be worth using due the overhead of pointers, etc. However, special purpose hardware can overcome this overhead.

Tensor core performs D = AB + C, so how to execute this in an instruction. For a GPU, 30pJ to fetch / decode / operand fetch the instruction. So specialized instructions can then operate as efficiently as specialized hardware, but with that overhead. On a GPU that power is ~20% overhead.

Keynote: An Invisible Woman: The Inside Story Behind the VLSI Microelectronic Computing Revolution in Silicon Valley

Conjecture: Almost all people are blind to innovations, especially ones by 'others' whom they did not expect to make innovations. ('others' = 'almost all people')

Basically, we rarely notice any innovation, so they are ascribed to the perceived likely cause (c.f., the Matthew effect or the Matilda effect). Credit for innovations is highly visible, and many awards are ascribed to people with certain reputations rather than the specific innovator.

Monday, October 14, 2019

Conference Attendance - MICRO 52 - Day 1

I am in Columbus Ohio for MICRO 52. A third of the attendees drove from other "midwestern" universities, of which I am one.

Keynote: Rejuvenating Computer Architecture Research with Open-Source Hardware

Moore's Law is irrelevant now, as the cost per transistor has held steady since the 28mm technology node. The cost of any deployment depends on the development cost and only at very large scales, is the cost per transistor dominant. Given that, how can we reduce the cost of hardware development.

Cambrian explosion of (RISC) ISAs in mid-1980s on with a great diversity of ISAs being created and competing. Then the Intel Pentium came out, which combined the CISC ISA with a translation into the RISC micro ops. This extinction event destroyed most of those ISAs.

Why does the instruction set architecture (ISA) matter? It is the dominant interface in the system, defining the interaction between software and hardware. But ISAs are currently proprietary, and tied to the fortunes of the company. Many ISAs have come and gone. And then each SoC (system on a chip) gets custom ISAs for each accelerator.

So there is now the RISC-V ISA that is open for use and development (which I wrote about here). The RISC-V foundation was formed in 2015 to be the neutral guardian of the specification and formal model. Based on this specification, there are both open-source and commercial implementations of the hardware as well as the software ecosystem.

ComputeDRAM: In-Memory Compute Using Off-the-Shelf DRAMsDRAM is designed based on commands being sent in a specific order with appropriate timings. The oddity is that if specific commands and timings are used that violate the normal usage, then the DRAM module can perform certain operations, such as AND and OR using three specially prepared rows (source x2 and destination).

Hybrid Skiplist: Combining the Best of Near-Data-Processing and Lock-Free Algorithms

This is a student research competition work that I want to highlight. The work is taking skip-lists, a multi-level linked list to support more efficient traversals, which has been implemented on both near-data processing (NDP) systems as well as lock-free. The performance of the two implementations is comparable, but we should be able to do better. The observation is that lock-free gains by having the long, frequently-accessed links in the cache, while NDP gets the data items close. Therefore, let's combine the two approaches so the algorithm uses the lock-free approach on the long links, and leaves the rest in NDP. A dynamic approach then adapts which nodes are in the long list and promotes them, while demoting less frequently accessed elements.

Applying Deep Learning to the Cache Replacement ProblemLet's apply machine learning to cache replacement. Offline, a ML model can perform better than the best replacement schemes, but offline this requires lots of space, more than the cache itself. Current algorithms (such as Hawkeye) use just the current PC, whereas the observation is that the machine learning model includes history, so perhaps history can have value. Using this, they analyzed the history further to notice that this history information is not complete nor does it have to be ordered. If it does not need to be ordered, then the history is a feature list (i.e., bitvector) and not a full list, so the history feature gives an index into a table of predictors for whether a line is cache friendly in usage.

NVBit: A Dynamic Binary Instrumentation Framework for NVIDIA GPUs

This is a release of a Pin-like tool, but for GPUs. Using the framework, you can write specific instrumentation to be applied to CUDA kernels. The framework does an analysis of the kernel to find the specific instrumentation points and then recompile / JIT the code to integrate the request types into the kernel without requiring the actual source code for the kernel. Such types as the specific instructions executed as counts or traces. And thereby build a simulator or error checker.

Friday, March 1, 2019

Conference Attendance: SIGCSE 2019 - Day 1.5

Back at SIGCSE again, this one the 50th to be held. Much of my time is spent dashing about and renewing friendships. That said, I made it to several sessions. I've included at least one author and linked to their paper.

Starting on day 2, we begin with the Keynote from Mark Guzdial

"The study of computers and all the phenomena associated with them." (Perlis, Newell, and Simon, 1967). The early uses of Computer Science were proposing its inclusion in education to support all of education (1960s). For example, given the equation "x = x0 + v*t + 1/2 a * t^2", we can also teach it as a algorithm / program. The program then shows the causal relation of the components. Benefiting the learning of other fields by integrating computer science.

Do we have computing for all? Most high school students have no access, nor do they even take the classes when they do.

Computing is a 21st century literacy. What is the core literacy that everyone needs? C.f. K-8 Learning Trajectories Derived from Research Literature: Sequence, Repetition, Conditionals. Our goal is not teaching Computer Science, but rather supporting learning.

For example, let's learn about acoustics. Mark explains the straight physics. Then he brings up a program (in a block-based language) that can display the sound reaching the microphone. So the learning came from the program, demonstration, and prediction. Not from writing and understanding the code itself. Taking data and helping build narratives.

We need to build more, try more, and innovate. To meet our mission, "to provide a global forum for educators to discuss research and practice related to the learning and teaching of computing at all levels."

Now for the papers from day 1:

Lisa Yan - The PyramidSnapshot Challenge

The core problem is that we only view student work by the completed snapshots. Extended Eclipse with a plugin to record every compilation, giving 130,000 snapshots from 2600 students. Into those snapshots, they needed to develop an automated approach to classifying the intermediate snapshots. Tried autograders and abstract syntax trees, but those could not capture the full space. But! The output is an image, so why not try using image classification. Of the 138531 snapshots, they generated 27220 images. Lisa then manually labeled 12000 of those images, into 16 labels that are effectively four milestones in development. Then, a neural network classifier classified the images. Plot the milestones using a spectrum of colors (blue being start, red being perfect). Good students quickly reach the complete milestones. Struggling students are often in early debugging stages. Tinkering students (~73 percentile on exams) take a lot of time, but mostly spend it on later milestones. From these, we can review assignments and whether students are in the declared milestones, or if other assignment structure is required.

For the following three papers, I served as the session chair.

Tyler Greer - On the Effects of Active Learning Environments in Computing Education

Replication study on the impact of using an active learning classroom versus traditional room. Using the same instructor to teach the same course, but using different classrooms and lecture styles (traditional versus peer instruction). The most significant factor was the use of active learning versus traditional, with no clear impact from the type of room used.

Yayjin Ham, Brandon Myers - Supporting Guided Inquiry with Cooperative Learning in Computer Organization

Taking a computer organization course with peer instruction and guided inquiry, can the peer instruction be traded for cooperative learning to emphasize further engagement and learning. Exploration of a model (program, documentation), then concept invention (building an understanding), then application (apply the learned concepts to a new problem). Reflect on the learning at the end of each "lecture". In back-to-back semesters, measure the learning gains from this intervention, as well as survey on other secondary items (such as, engagement and peer support). However, the students in the intervention group did worse, most of which is controlled by the prior GPA. And across the other survey points, students in the intervention group rated lower. The materials used are available online.

Aman, et al - POGIL in Computer Science: Faculty Motivation and Challenges

Faculty try implementing POGIL in the classroom. Start with training, then implementing in the classroom, and continued innovation. Faculty want to see more motivation, retaining the material, and staying in the course (as well as in the program). Students have a mismatch between their learning and their perceived learning. There are many challenges and concerns from faculty about the costs of adoption.

Tuesday, February 27, 2018

Conference Attendance SIGCSE 2018

I have just finished attending SIGCSE 2018 in Baltimore. In contrast to my earlier conference attendance, this time I have had higher involvement in its execution.

On Wednesday I went to the New Educator's Workshop (NEW). Even being faculty for two years, there was still a number of things that were either new or good reminders. Such as including or discussing learning objectives with each lecture and assignment, or being careful with increasing one's level of service. As a new faculty member, each service request seems exciting, as no one has asked me before! But many senior faculty emphasized that this is the time in which they are protecting us from lots of service opportunities such that we can spend time on our teaching and research.

On Thursday morning, I presented my recent work that updated a programming assignment in Introduction to Computer Systems, and from which we saw improvements in student exam scores. We did not research the specific action, and are therefore left with two theories. First, the improvement could be from using better style in the starter code and emphasizing this style in submissions. Second, we redesigned the traces to require submissions to address different cases and thereby implement different features. I lean toward the formed, but have no data driven basis for this hypothesis.

Let's discuss active learning briefly. I attended (or ran) several sessions focused on this class of techniques. The basic idea is that students have better engagement and learning by actively participating in class. There are a variety of techniques that work to help increase student activity. On Thursday afternoon, Sat Garcia of USD, presented Improving Classroom Preparedness Using Guided Practice, which showed how student learning improved from participating in Peer Instruction, which particularly requires students to come to class prepared. Shortly later, Cynthia Taylor joined Sat and I in organizing a Bird of Feather (BoF) session on using Active-learning in Systems Courses. We had about 30-40 attendees there split into two groups discussing some techniques they have used and problems they have observed. 5 years ago, a similar BoF had attendance around 15-20, so we are making progress as a field.

On Friday, I spoke with Brandon Myers who has done work on using POGIL in Computer Organization and Architecture. In POGIL, students are working in groups of 3-4 with specific roles through a guided learning, guiding students into discovering the concepts themselves. We had a nice conversation and may be merging our draft resources. This last point is often the tricky part of using active learning in that developing reasonable materials can be both time intensive and requires several iterations.

The Friday morning keynote presentation was given by Tim Bell, who spoke about K-12. This topic is rather distant from my own work and research, so I was skeptical. Yet, I came out quite enthused. It was interesting to think about presenting Computer Science concepts in non-traditional ways, based initially on having to explain your field at elementary school when the other presenters are a cop and a nurse (his example). How could you get 6 year olds to sort? Or see the advantage of binary search as the data grows?

In the afternoon, I was a session chair for the first time. I moderated the session on Errors, so obviously the AV system stopped working for a short duration. Beyond that incident, the session seemed to go well.

I always like going to SIGCSE. It is rejuvenating and exhausting. So many teachers to speak with about courses, curriculum, and other related topics. And then you find that you've been social for 16 hours or so hours.

Saturday, March 11, 2017

Conference Time: SIGCSE 2017 - Day 2

I started my morning by attending my regular POGIL session. I like the technique and using it in the classroom. However, I should probably make the transition, attend the (all / multi-day) workshop, and perhaps get one of those "ask me about POGIL" pins.

Lunch was then kindly provided by the CRA for all teaching-track faculty in attendance. There is the start of an effort to ultimately prepare a memo to departments for how to best support / utilize us (including me). One thing for me is the recognition of how to evaluate the quality of teaching / learning.

Micro-Classes: A Structure for Improving Student Experience in Large Classes - How can we provide the personal interactions that are valuable, which enrollments are large / increasing? We have a resource that is scaling - the students. The class is partitioned into microclasses, where there is clear physical separation in the lecture room. And each microclass has a dedicated TA / tutor. Did this work in an advanced (soph/ junior) class on data structures?

Even though the same instructor taught both the micro and the control class, the students reported higher scores for the instructor for preparedness, concern for students, etc. Yet, there was no statistical difference in learning (as measured by grades).

Impact of Class Size on Student Evaluations for Traditional and Peer Instruction Classrooms - How can we compare the effectiveness of peer instruction being using in courses of varying class sizes? For dozens of courses, the evaluation scores for PI and non-PI classes were compared. There was a statistical difference between the two sets and particularly for evaluating the course and instructor. This difference exists even when splitting by course. This difference does not stem from frequency of course, nor the role of the instructor (teaching, tenure, etc).

Wednesday, February 8, 2017

Conference Attendance CGO (etc) 2017 - Day 3

Today is the last day for the conference. I attended several more talks today and my student took 2nd in the student research competition. So it has been a good attendance and I have received some interesting and valuable feedback on my own work, as well as finding some possible collaborations for the future.

Optimistic Loop Optimization
The compiler wants to optimize loops in the program. However, C and LLVM IR have many complex characteristics that the Polyhedral model cannot represent, such as aliasing, wrapping, or out of bound accesses. Rather than just assuming that these characteristics are not present, instead, the code can be analyzed to determine which violating characteristics may be present. These assumptions are placed in the code, and can then be reduced to the set of preconditions for which the optimized loop can be executed. Should the assumptions fail, the code instead can branch to the original version. These optimizations can also be optimized (for example N < 127 implies N < 128). For SPEC200x, the assumptions fail about 2% of the time and impose 4% runtime overhead.

Software Prefetching for Indirect Memory Accesses
What should we prefetch in software? A[x + N] is easy for hardware, A->next is hard for everyone, while A[B[x + N]] is easy for software and hard for hardware to predict. So given a loop (such as exists in NAS is) that has this indirect structure, then prefetches can be inserted that will speedup the loop. Yet, there are three key points for inserting these prefetch instructions.
- You have to prefetch both the A[B[]] and B[]. Otherwise, the prefetch will block on the B[].
- Having both prefetches requires that they are both offset from the actual access as well as each other. Too close and they are not parallel. Too far and the data is evicted before use.
- The first point raised that there is an actual load (not prefetch) of B[] and therefore needs to be bounds checked.

Tuesday, February 7, 2017

Conference Attendance CGO (etc) 2017 - Day 2

Several of the talks were great and very interesting. Other talks particularly needed further presentation practice. Unfortunately, sometimes this may come from English as a second language. And so I am torn between wanting presenters to have practice and be open to a wider pool of researches, while also wanting to be able to easily understand the presentations.

Tapir: Embedding Fork-Join Parallelism into LLVM’s Intermediate Representation
Let's start by considering code that normalizes a vector. This code takes 0.3s to run. Then switch the "for" with a "cilk_for", and the execution time improves to 180s (w/ 18 cores). When the compiler sees "cilk_for" or other parallel keywords, generally it converts these into runtime function calls that take in a function pointer for the parallel component. (Similar to thread create routines taking in a function to execute). With the function call, many optimization passes cannot cross the call, while previously being able to cross the "for".

Instead, let's propose three new instructions to include in the LLVM IR. Supporting these lines required approximately 6000 lines of changes. When the updated LLVM compiles a set of parallel programs, most can now reach 99+% work efficiency, which indicates that the parallel overhead is near 0.

Prior work would create parallel tasks symmetrically, for example each task would represent separate paths in the classic "diamond" CFG. The problem is that the parallel program is actually taking both paths concurrently, which is not an expected behavior of the control flow. Instead, the IR is asymmetric so that compilers can continue to reason about the basic blocks as a sequential code would appear.

Incremental Whole Program Optimization and Compilation
This covers the feature within Microsoft's Visual Studio compiler. Each component stores hashes of the components on which it depends. When a file is changed, it generates different hashes, which the compiler then can use to determine that its dependencies need to be re-analyzed and code gen'd. These hash changes can then either propagate, if changed, or the compilation process will complete.

Optimizing Function Placement for Large-Scale Data-Center Applications
The common binaries for facebook are 10s-100s MBs in size. These binaries have IPCs less than 1.0 (recall that processors can run above 2.0 and higher is better), and are experiencing frequent front-end stalls that are attributable to iTLB and I$ misses (as high as 60 per 1000, eww). Hardware profilers can then determine the hot functions. This information is then processed to determine the hot functions that should be clustered together. These clusters are mapped to separate loader sessions that will load them using huge pages.

Minimizing the Cost of Iterative Compilation with Active Learning
There are too many possibilities for optimization. Let's ask machine learning to figure this out. The danger is always finding the right level of training to provide valuable insights without overfitting, etc.

Monday, February 6, 2017

Conference Attendance CGO (etc) 2017 - Day 1

I started the conference attendance this time on Saturday, with the LLVM-Performance workshop at which I presented an extension to my dissertation work. I received some interesting and useful feedback from the other attendees, as well as saw additional possibilities of its usage and collaboration. Now that it is Monday, it is time to attend some conference talks. In the evening today, I will be being an advisor and watching one of my students present our work, which we practiced today so it should go great!

Checking Concurrent Data Structures Under the C/C++11 Memory Model
C/C++11 included additional keywords that allow specifying features of the memory model, previously covered. In order to check data structure implementations, the data structures need to be further annotated so as to further describe valid and invalid executions. For example, is a queue required to always return an element, or can it fail if an element was recently added? Using these annotations, the authors were able to find issues and other identifications for the data structures.

Efficient Abortable-locking Protocol for Multi-level NUMA Systems
The impact of NUMA can be significant. On the largest shared-memory machines, the difference between accessing lock data that is local to an SMT thread versus the farther distance is over 2000x slower. To avoid this overhead, there is a hierarchy of locks created that mirrors the system's topology. Each level of the hierarchy acts as a MCS-style queue lock. How then can these threads abort from their queue? In a single level, threads mark their status as aborted and are then skipped when handing off the lock. In the hierarchy, the requirement of waiting on the specific level is passed along to the appropriate thread, as can be determined by using the lower level links.

The implementation was model checked and then run on a HP Superdome with 576 hardware threads. The results showed that the lock implementation performs best when it respects all 4 levels of the NUMA (and NUCA) hierarchy.

Thread Data Sharing in Cache: Theory and Measurement
In this work, they collected memory access traces using Pin to determine the thread data sharing in parallel programs. Particularly they worked to show how the different metrics of data sharing can be derived from a single pass of the trace, and is linear in trace size. The concern is that the trace / analysis approaches are very slow and could potentially skew the results. And when the results are derived from only one trace, there is additional question about their applicability.

Friday, October 14, 2016

Conference Attendance Teaching and Learning Summit 2016 - Keynote

Critical Thinking: Why is it so hard to teach? - Dr Daniel T. Willingham

Critical thinking is intertwined with content knowledge. We saw a sequence of four examples (If vowel then even number, if alcohol then 21, if gin, then haddock, if entering then cholera vaccine), for each example, there is a claim about a set of cards: If X then Y. Given four cards, verify the claim. If the problems were formulated based on permissions, then the success rate was high. Each problem is technically, P -> Q, but having just completed a semester of logic has no impact on results.

Scientific reasoning is taught in two pieces scientific concepts and scientific method. So consider designing a learning experiment. The group is split into intervention and control. How do you know that the random sample is valid? Background knowledge is required to determine the appropriateness of the split.

Critical thinking occurs from learning at the deep level. The surface story is say, "tumors and rays". The deep question is whether it is modus pones, Netwon's third law, etc? However, memory is focused on the surface facts. Recall is based on those components.

Why not teach the deep structure immediately? Abstractions are hard to understand. Instead, learners have to see lots of surface structures all overlaying the same deep structure.

Sometimes failures in critical thinking are actually failures in basic knowledge. Furthermore, there are also innate biases, such as words refer to objects and attributes, and the world is full of agents and purposes.

Takeaway 1: Most of critical thinking is domain-specific.

Takeaway 2: In each domain, faculty should identify what they consider the important critical thinking skills.

Takeaway 3: Select content with an eye toward teaching these skills. Teach the critical thinking in the context of the content.

Takeaway 4: Critical thinking is a curricular issue. These skills require more than 1 semester to acquire.

Takeaway 5: Certain foundational concepts may run counter to the mind's biases. Students have functional knowledge that has worked so far. For example, "equals sign means put answer here".

Q. Translating domain skills in interdisciplinary work?

A. Don't know. Probably needing to know enough of the skills in the home domain to be able explore the other domain.

Q. If critical thinking is domain specific, how specific are domains?

A. Domains are nested. Proper application requires domain knowledge. Moving from cognitive psychology to social leaves [the speaker] less skilled, but still better than average. Into clinical psychology, they have a common basis, but limited ability to apply.

Wednesday, March 9, 2016

Repost: How I spent my time at SIGCSE

When I attend a conference, I try to prepare blog posts detailing the presentations that I see and other items of objective content. More important are the people in attendance. I spend more time meeting with colleagues than I do actually sitting in the sessions. Before you are shocked, understand that much of our conversation are about these sessions. Depending on the conference there can be between 2 and 10? sessions occurring concurrently. I cannot be in 2, let alone 10, places at once, so instead we sample (as in the appearance of randomly selecting) the sessions and then discuss.

I met Janet Davis, who now heads the CS program at Whitman College, during my first time at SIGCSE. I value her different perspective and always try to seek her out at some point during the conference.

These themes merge in her blog post that shows some idea as to how each day is organized and how the sessions (papers, panels, etc) play a part in a very busy schedule.

Friday, March 4, 2016

Conference Attendance SIGCSE 2016 - Day 2

After lunch when we are all in food comas, let's attend the best paper talk!
A Multi-institutional Study of Peer Instruction in Introductory Computing -
This study followed 7 instructors across different institutions as they used peer instruction. This showed that both the instruction is generally recognized as valuable, while also touching on routes in which it can go awry. Tell students why this technique is being used and what it's effect. Hard questions are good questions to ask, as students will discuss and learn from the question. This requires that questions are graded for participation and not *correctness*. Possible questions and material for peer instruction is available.

Development of a Concept Inventory for Computer Science Introductory Programming -
A concept inventory is a set of questions that carefully tease out student misunderstandings and misconceptions. Take the exams and identify both the learning objective and the misconception that results in incorrect answers.

int addFiveToNumber(int n)
{
int c = 0;
// Insert line here
return c;
}

int main(int argc, char** argv)
{
int x = 0;
x = addFiveToNumber(x);
printf("%d\n", x);
return 0;
}

a) scanf("%d", &n);
b) n = n + 5;
c) c = n + 5;
d) x = x + 5;

Each incorrect answer illustrates a different misconception. For example, input must come from the keyboard. Or variables are passed by reference.
Overall, this study illustrated how the concept inventory was developed, but not the impact of having it, or what it showed in the students and their learning.

Uncommon Teaching Languages - (specifically in intro courses)
An interesting effect of using an uncommon language in an introductory course is that the novices and experts have similar skills. Languages should be chosen to minimize churn, otherwise students feel that they haven't mastered any languages. And related to this point, languages also exist in an institutional ecosystem. Furthermore, we want to minimize the keywords / concepts required for a simple program. A novice will adopt these keywords, but they also are "magic" and arcane. And then how long are the programs, as we want novices to only have to write short code to start.

I also attended the SIGCSE business meeting and then the NCWIT reception. I have gone to NCWIT every year at SIGCSE, as I want to know what I should do (or not do) to not bias anyone's experience in Computer Science.

Thursday, March 3, 2016

Conference Attendance SIGCSE 2016 - Day 1

Here I am at SIGCSE again. This is a wonderful opportunity to think and reflect on how I assist students in learning Computer Science and to be Computer Scientists. And to connect with other faculty, researchers, etc who are interested in teaching and doing so in a quality manner.

An Examination of Layers of Quizzing in Two Computer Systems Courses -
In this work, the instructor taught the Intro Computer Systems course and based on Bryant and O'Hallaron's book (paid link). After several years of teaching, she introduced a new layer of quizzing to the course. Effectively before each class, students take a pre-quiz worth ~0% of their grade (20 quizzes combine to 5%), and can then come to class with knowledge and feedback toward their deficiencies. From the experience of the quizzes, students have been doing better in these courses.

Subgoals Help Students Solve Parsons Problems - (previewed at Mark Guzdail's blog)
When learning new things, students benefit from labeling subgoals in solving. These labels provide a basis for solving similar problems. There are two different strategies for labeling: students can provide the labels or the assignment can provide the labels. An example labeling can be found with loops: initialize, test, change. If students provide the labels and provide cross-problem labels, they do best. If they provide the labels and they are problem-specific such as "are there more tips" (with respect to an array of tips), then these students do worse than those provided the labels. Developing labels can be valuable, but it may require the expert to still provide guidance to help abstract them across problems. This talk had one of the great moments when someone asked a question and Brianna replied by, "So and so has done great ..." And the questioner pointed out that he is "so and so".

As CS Enrollments Grow, Are We Attracting Weaker Students?: A Statistical Analysis of Student Performance in Introductory Programming Courses Over Time -
In this study, one instructor has analyzed the data of student assignment grades across 7 years of Fall semesters in the CS 1 course. Several specific and clear reasonings were applied to get a clear and comparable data set. The first test is that the number of student withdrawals remained the same as a percentage of the total class size. The second test is that the means of the grades for the courses are statistically indistinguishable. The third test is to use a mixture model (weighted combination of distributions) for each class's scores. A good fit is found with two gaussian distributions, such that there is one for the "good students" and a second for the high variance students who are "potentially weaker". From this, the study concluded that (at Stanford, in Fall CS1), there are more "weak students" and more "strong students" as the student enrollment is drawing from the same larger population.

A (Updated) Review of Empiricism at the SIGCSE Technical Symposium -
Using the proceedings from SIGCSE 14 and 15, they examined the empirical evaluation and the characteristics of these evaluations. How was the data collected in each paper? And what was being evaluated (pedagogy, assignments, tools, etc)? Is the subject novel or replicating other studies? Based on this study, would SIGCSE benefit from a separate track for longer paper submissions? Or workshops on how to empirically validate results? This and other material is being developed under an NSF grant and released publically.

Birds of a Feather -
In the evening, I attended two Birds of a Feather sessions. Both of which have given me further ideas for what I might do to further (attempt to) improve student learning. And also possible collaborators toward that end.

Tuesday, January 19, 2016

Conference Attendance HiPEAC - Day 2 - Papers

Another conference day. Much of my time is spent talking with other attendees and doing "work", such as preparing my presentation, send emails, etc. However, I do take some time to actually sit in on other presentations, so here are two highlights:

PARSECs - This work explores rewriting some of the PARSEC benchmarks to use a task-based parallelism (OpenMP tasks), rather than pthreads. For many workloads, these changes provide improved scaling. For almost all workloads, the code size was reduced as the original thread pools, job queues, etc could be removed. In the near future, these revised versions should be released.

HRF-Relaxed - The original OpenCL had no memory model; however, many vendors implemented one. Now, C++ and other languages use SC for DRF (sequential consistency for data-race-free programs). Unfortunately, if you use this consistency model in OpenCL, you will lose performance. Instead, this work proposes a hierarchical race-free model, where the races are only checked at a certain scope of the program.

Monday, January 18, 2016

Conference Attendance HiPEAC - Day 1 - MULTIPROG

It is once again, conference time. For North Americans, this might seem rather early as I am writing from Prague, Czech Republic (or at least when I started 12 hours ago). I am attending HiPEAC, which is the premier European computer architecture conference. HiPEAC is a dual-track conference. Throughout the three days there is the paper-track, where the accepted papers to TACO (such as mine) are presented. And simultaneously there are workshops. For the first day, I am starting with the MULTIPROG workshop, which is on Programmability and Architectures for Heterogeneous Multicores.

Let's start with the keynote, given by David Kaeli of Northeastern University.
- Concurrent execution of compute kernels
- Scheduling of kernels, deadlines
- Sharing / access to host memory (i.e., RAM)

The current model of using a GPGPU is that it runs 1 computation kernel; however, there are many problems that would better decompose into several separate kernels. It would also be valuable if there were further examples of these problems (i.e., benchmarks). Now, whenever you try running multiple anything on a computational resource, there is a runtime scheduling problem. Which should run to best complete the overall problem. A follow-on research question explores this question a cloud-based environment where the GPU may be shared across entirely independent compute kernels. This requires the kernels to be tagged with IDs to ensure that their memory is kept separate. All of this sounds as if we need an OS for the GPU.

Following the late-morning break, we heard next from MECCA (MEeting the Challenges in Computer Architecture) - 3Ps: parallelism, power, and performance. Consider parallel program annotations for describing the concurrency, runtime management of caches using the annotations to indicate the flow of data and transfer the data before it is required and with the appropriate coherence states and indicate when a block is dead and can be evicted from the cache.

Then there was lunch, resting from my flights, then networking, especially the part where I stood by my poster and discussed my research for 3 hours. Now to rest for day 2.

Wednesday, June 17, 2015

Conference Attendance FCRC - Day 5 - Plenary Summary

Plenary Talk today, which pulls together all of the conference attendees. Sunday's talk was based in databases, with Michael Stonebraker speaking on his Turing-award winning work. Monday's talk discussed interdisciplinary work, primarily centered in CS theory, and was given by Andrew Yao (a prior Turing Award winner). On Tuesday, Olivier Temam discussed neural networks in hardware, which focused on his work and efforts to better model or mimic the capabilities of the brain.

The F# Path to Relaxation -
There are opportunities to introduce new work toward relaxing and improving. Or perhaps create opposing camps. Thesis <-> Antithesis ==> synthesis. Or Functional <=> Interop. Back in 2003, functional languages were isolated, non-interoperable, using their own VMs. F# (along with Scala, Swift, ...) instead seeks to have an exosystem, being the external industry-standard runtimes. Another tension is between Enterprise and Openness. So F# is open and cross-platform. Tools are available for Android and iOS, as well as packages for Linux.

Functional <=> Objects
Thus embrace objects, without being object-oriented. Some cases in the cross-product of the expected features for objects and functions requires particular care for synthesis.

Circularities and Modularity in the Wild
Lambdas, generics, etc are clearly being embraced in modern language design. However, circular type dependencies are unfortunately also widely present. Languages need to enforce acyclicity.

Pattern Matching <=> Abstraction
How does the language support the functional concept of pattern matching, when you want to include type abstraction? Alas, the speaker skipped the solution quickly.

Code <=> Data
Most development is to providing tools for the information revolution. There is exponential growth in Open APIs for accessing data from the internet. This data then comes with dynamic types, where the types are only known once the data (or schema) has been accessed. The type creation can also enable blending code for other languages into the F# environment. For example, the support can allow opening csv or json files and having types for the data. This feature is, by far, the most exciting and interesting of the presentation. Not quite worth the price of admission, but clearly a great development.

Applied PL design comes from the synthesis at the heart of these contradictions. This tension also is part of the proliferation of languages.

Conference Attendance FCRC - Day 4 - PLDI

PLDI starts off this morning with Concurrency. As a student volunteer, I worked this session and was limited as to what I could note about the content itself.

Composing Concurrency Control - Introducing more diverse and finer-grained locking mechanisms. The tool works to develop a locking strategy that will guarantee serializability, abort-safety, opacity, and deadlock-freedom. It particularly works to integrate both locking schemes as well as transactional memory.

In the afternoon, I can dive into the semantics of the C language.

A Formal C Memory Model Supporting Integer-Pointer Casts - What optimizations are possible in the presence of pointers, pointer arithmetic, and integer-pointer casts? For example, can constants be propagated or is their location potentially targetable by a pointer? Other optimizations are explored in their paper. In practice, as code can generate arbitrary addresses, how can the compiler reason about any specific location in memory.

Defining the Undefinedness of C - Extending their prior work that gave semantics to defined behavior of C programs, which required doubling the rules to describe the semantic behavior. Fundamentally, any instance of undefined behavior that will be definitely encountered in an execution will invalidate that execution. For example, dividing by zero after a printf is valid to crash before the printf. The following code example is also undefined.

return (x = 1) + (x = 2);

Many of these cases are dependent on runtime behavior, and therefore a tool that can help identify them is valuable.

Monday, June 15, 2015

Conference Attendance FCRC - Day 3 - PLDI / ISCA

PLDI itself began this morning and after the welcome, we had three distinguished papers. I am excited that two of these works focused on code performance and compilers, rather than higher-level programming language issues:

Automatically Improving the Accuracy of Floating Point Expressions - How do you address rounding error in your code? Use formal numeric methods an expert can reduce the errors. But rather than be an expert, they wrote a tool to use heuristics to apply these methods. For example, what error do you have when evaluating the quadratic formula. Based on just the value for b, there are different expressions that have much lower error.

The tool, Herbie, estimates the accuracy of the expression and then attempts to use algebraic transformations (from a database of 120 rules). Having generated many candidate expressions, the tool then selects using dynamic programming an appropriate set of expressions across the input space. First, it matches the example cases from the Hamming's Numeric Methods book. And furthermore has found bugs in existing projects.

Diagnosing Type Errors with Class - SHErrLoc works to identify the likely cause of type errors. Expressions are given constraints. These constraints form a graph, which is analyzed for failing paths in the graph. The tool then attempts to localize the failure and identify the minimal change to the constraints to satisfy the graph. Even though it is not Haskell specific, it is more accurate at detecting type errors in Haskell programs than related work.

Provably Correct Peephole Optimizations with Alive - Compilers are buggy. For example, LLVM's InstCombine is an LLVM pass that exploits the LLVM IR to improve performance, which contains many hand-rolled transformations. Propose a DSL that describes Peephole Optimizations, where the the DSL is basically a simplified LLVM IR annotated with preconditions for the transformation. Then the expression describing the transformation is passed through constraint checkers to verify it is correct. And then generate C++ code for that transformation.

Correctness of the expression must not introduce new undefined behaviors, still produces the same result, and properly updates the memory state. Initially proved the optimizations in InstCombine correct or identified bugs, and eventually could replace the pass with the generated version. Furthermore, Alive was able to strengthen the post-conditions for many instructions (for example, identifying whether an operation will overflow).

In the afternoon, I was visiting the other side of the conference center with ISCA. One paper of note there:

A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing - They showed that (somehow, as I missed this point) integrating the simple core closer to the memory system, they pushed past the memory bandwidth of HMC (640GB/s) and instead to about 2.3TB/s. They focused on two pieces: an updated programming model for graphs and a prefetching system within their cores. The model introduced async remote procedure calls that are sent to the Tesseract core near the data. These messages accumulate in a queue until either a barrier or the queue is full. While they accumulate, the prefetcher is requesting the appropriate data so when the function fires, the data is available. The prefetcher is able to operate on the two separate streams: the local processing that is sequential and generating the remote requests, and then the remote requests received at this node.

Sunday, June 14, 2015

Conference Attendance FCRC - Day 1 - WCAE / SPAA

In Portland for the next 5 days attending the Federated Computing Research Conference, which is a vast co-location of the top ACM conferences. For my part, this includes ISCA and PLDI. Following registration and checking in as a student volunteer, I ducked in to the Workshop on Computer Architecture Education (WCAE). There were a couple of presentations on different tools being used to teach architectural concepts.

Following the morning break, it was time for the keynote for SPAA, given by Hans-J Boehm, titled, "Myths and Misconceptions about Threads". For example,

#include "foo"
f() {
foo_t x, a;
...
x = a; // Is this atomic?
}

Which lead to the discussion of 'is assignment atomic?' and the audience tossed out increasing complex examples of how it is not. Fundamentally, the programming model is becoming "data-race free", and the specifications can treat races as "undefined behavior". In general, a sequential program will view its execution following the sequential consistency model, even if the hardware is executing the code with a weaker model.

What then should the programming language provide for the atomics / synchronization? Recall that the compiler has considerable flexibility for emitting the final program. With data-race free code, the compiler is treating anything that is not an atomic as part of sequential code and therefore subject to any reordering that would still be valid sequentially. The following example is how this can go awry. X is a global, and the compiler could substitute x anyplace tmp is, because the model assumes "there are no races on x". And if the program does happen to modify x is a racy manner, then the behavior is undefined.

bool tmp = x;
if (tmp) f = new ...
...
if (tmp) f->foo();

Gah! The programmer wanted to take a snapshot of the global value, but ended up with a different result. So the atomics are becoming more than just the "hacker's" way to quickly update shared values, and instead can be seen as annotations to the compiler to clearly encapsulate the shared state. This means the type of x is not bool, but atomic<bool>. Then the compiler knows the programmer's (likely) intent of this code. And this then rolls back to a deeper question of my research, "What could the system do more efficiently if it knew more about the programmer's intent?"

Saturday, March 7, 2015

Conference Attendance SIGCSE 2015 - Day 2 / 3

I recognize that Day 1 afternoon went “missing”. I presented my poster and that consumed the sum total of my time. While I am happy with all that I achieved with my poster (writing IRB protocol, independent work, analyzing my teaching, et cetera), it was not considered as a finalist for the student research competition (SRC). Yet I received significant feedback and a number of follow-ons that I will have to try to evaluate the next time(s) I teach. I have been doing an excellent job of networking and speaking with my colleagues. And I have seen several exciting techniques to improve my teaching.

BOF2: Preparing Undergraduates to Make the Most of Attending CS Conferences

This was a small Bird of the Feather (BoF), but we discussed some approaches and things to know before attending your first conference, and not just for undergraduates. However, in some cases, undergraduates are attending their local institution and may never have traveled any significant distance. What do they really need to bring?

In traveling, take some time to prepare students. Let them know what to expect. For example, it is okay to miss some paper sessions, and even return to your room entirely. It is okay to ask questions 1:1. Find groups where people are being introduced and join in. Student volunteering, while takes time, also gives an additional individuals that you will know. Use the people you know to introduce you to others at the conference.

MIPSUnit: A Unit Testing Framework for MIPS Assembly

This is just what it sounds. A Ruby based framework that enables writing simple unit tests that will then be applied to a full simulation of the assembly executed.

A Web-based Simulator for Learning Multiprocessor Real-time Scheduling

The presenter(s) were not at this poster, but it showed a high quality interface for seeing the scheduling of threads according to different scheduling policies. The intent here was not to explore races and parallelism, but rather see how scheduling decisions are made in an OS.

Raspberry HadooPI: A Low-Cost, Hands-On Laboratory in Big Data and Analytics

I was not expecting this poster. You are walking along and then see 4 Raspberry Pi's all networked together. Raspberry Pis and HPC?! A small setup, but it is an interesting development that takes advantage of the low cost Pi and still provide an HPC platform for students.

Building the Pascaline: Digital Computing Like It's 1642

Plastic parts all worked together to form replicas of Pascal's mechanical calculator. Interesting and student assembled.

On the Evaluation of Student Team Software Development Projects

Teams of 4 students, approach is evaluated on courses from three years of major (sophomore on up). Teams are formed with CATME (particularly using dissimilar GPAs in a group), as well as partner selection (when possible). Students provide peer evaluations after each stage of the project. Significant data collection looking particularly at what students prefer for to be the evaluation policy (between 100% of grade for the group’s work to 100% of the grade for the individual’s contribution). This question was taken repeatedly throughout the semester, which leads to whether student preferences change? More senior students prefer more weight being attributed to group. The predictor for what grade split is at what point in the course is this surveyed, and effectively as soon as the teams are formed the students prefer to be graded primarily as a group. Follow on study is looking at experience with team projects, trust in the ability to evaluate individual contribution, and other questions. This is a hopeful data point.

What Influences CS Faculty to Adopt Teaching Practices?

How do faculty become aware and why do they try out teaching practices? 66 participants in CS, including chairs, tenure-track faculty, teaching faculty, and Ph.D. student instructors across 36 institutions. First, the mental model of what an instructor does can differ significantly from what the instructor is actually doing. Second, faculty can find out about practices through a variety of approaches, such as self-identifying that there is possible improvement in their teaching. Faculty often trust other faculty like them (researchers to researches, lecturers to lecturers). Third, when adopting a practice, faculty need to evaluate the effectiveness (see also my poster, student feedback, etc). -- My efforts in this have been having different faculty (my recommendation letter writers) view my lectures / teaching, and thereby giving them demonstrations of different practices.

Personalized Attention @ Scale

"We lost the war on cheating" Instead, we have to meet with students such that they are demonstrating their understanding of the code. The requirements of submissions: attribute your sources and understand your submission. Enables students to work together, use all sources, develop interview skills. Enables reuse of assignments. Grading is now 40% correctness / 60% code interview. Rubric for each interview. Students should arrive early and have their laptop ready to present / explain. Students were better able to learn and complete the assignments, as well as feedback for improvement. Students also felt better able to learn the material by being able to collaborate and not constrained by a collaboration policy. There are some stressors, such as TAs having to meet with hundreds of students, as well as their inconsistencies. -- This was perhaps the most exciting new technique that I saw / heard about.