Elegant C: keynote

Showing posts with label keynote. Show all posts

Wednesday, October 16, 2019

Conference Attendance - MICRO 52 - Day 2/3

This is a rough writing of the notes from the other two keynotes.

Keynote Bill Dally on Domain-Specific Accelerators

Moore's Law is over. Sequential performance is increasing at 3% per year. And cost per transistor is steady or increasing.

Most of power is spent moving data around, so simple ISAs such as RISC are actually inefficient power-wise versus specialized operations. With special data types and operations, the hardware can be designed so that something taking 10s to 100s of cycles is done in 1. Memory bandwidth can bottleneck, as "bits are bits".

Genome matching, via Smith-Waterman algorithm, can be done in single cycle for many bases (10), while a CPU would be 35 ALU ops and 15 load/store. And the specialized hardware is 3.1pJ (10% is memory) and the CPU is 81nJ.

Communication is expensive in power, so be small, be local. 5pJ/word for L1, 50pJ/word for LLC, and 640pJ/word for DRAM. And most of this power is driving the wires.

Conventionally, sparse matrices need <1% set bits to be worth using due the overhead of pointers, etc. However, special purpose hardware can overcome this overhead.

Tensor core performs D = AB + C, so how to execute this in an instruction. For a GPU, 30pJ to fetch / decode / operand fetch the instruction. So specialized instructions can then operate as efficiently as specialized hardware, but with that overhead. On a GPU that power is ~20% overhead.

Keynote: An Invisible Woman: The Inside Story Behind the VLSI Microelectronic Computing Revolution in Silicon Valley

Conjecture: Almost all people are blind to innovations, especially ones by 'others' whom they did not expect to make innovations. ('others' = 'almost all people')

Basically, we rarely notice any innovation, so they are ascribed to the perceived likely cause (c.f., the Matthew effect or the Matilda effect). Credit for innovations is highly visible, and many awards are ascribed to people with certain reputations rather than the specific innovator.

Monday, October 14, 2019

Conference Attendance - MICRO 52 - Day 1

I am in Columbus Ohio for MICRO 52. A third of the attendees drove from other "midwestern" universities, of which I am one.

Keynote: Rejuvenating Computer Architecture Research with Open-Source Hardware

Moore's Law is irrelevant now, as the cost per transistor has held steady since the 28mm technology node. The cost of any deployment depends on the development cost and only at very large scales, is the cost per transistor dominant. Given that, how can we reduce the cost of hardware development.

Cambrian explosion of (RISC) ISAs in mid-1980s on with a great diversity of ISAs being created and competing. Then the Intel Pentium came out, which combined the CISC ISA with a translation into the RISC micro ops. This extinction event destroyed most of those ISAs.

Why does the instruction set architecture (ISA) matter? It is the dominant interface in the system, defining the interaction between software and hardware. But ISAs are currently proprietary, and tied to the fortunes of the company. Many ISAs have come and gone. And then each SoC (system on a chip) gets custom ISAs for each accelerator.

So there is now the RISC-V ISA that is open for use and development (which I wrote about here). The RISC-V foundation was formed in 2015 to be the neutral guardian of the specification and formal model. Based on this specification, there are both open-source and commercial implementations of the hardware as well as the software ecosystem.

ComputeDRAM: In-Memory Compute Using Off-the-Shelf DRAMsDRAM is designed based on commands being sent in a specific order with appropriate timings. The oddity is that if specific commands and timings are used that violate the normal usage, then the DRAM module can perform certain operations, such as AND and OR using three specially prepared rows (source x2 and destination).

Hybrid Skiplist: Combining the Best of Near-Data-Processing and Lock-Free Algorithms

This is a student research competition work that I want to highlight. The work is taking skip-lists, a multi-level linked list to support more efficient traversals, which has been implemented on both near-data processing (NDP) systems as well as lock-free. The performance of the two implementations is comparable, but we should be able to do better. The observation is that lock-free gains by having the long, frequently-accessed links in the cache, while NDP gets the data items close. Therefore, let's combine the two approaches so the algorithm uses the lock-free approach on the long links, and leaves the rest in NDP. A dynamic approach then adapts which nodes are in the long list and promotes them, while demoting less frequently accessed elements.

Applying Deep Learning to the Cache Replacement ProblemLet's apply machine learning to cache replacement. Offline, a ML model can perform better than the best replacement schemes, but offline this requires lots of space, more than the cache itself. Current algorithms (such as Hawkeye) use just the current PC, whereas the observation is that the machine learning model includes history, so perhaps history can have value. Using this, they analyzed the history further to notice that this history information is not complete nor does it have to be ordered. If it does not need to be ordered, then the history is a feature list (i.e., bitvector) and not a full list, so the history feature gives an index into a table of predictors for whether a line is cache friendly in usage.

NVBit: A Dynamic Binary Instrumentation Framework for NVIDIA GPUs

This is a release of a Pin-like tool, but for GPUs. Using the framework, you can write specific instrumentation to be applied to CUDA kernels. The framework does an analysis of the kernel to find the specific instrumentation points and then recompile / JIT the code to integrate the request types into the kernel without requiring the actual source code for the kernel. Such types as the specific instructions executed as counts or traces. And thereby build a simulator or error checker.

Friday, March 1, 2019

Conference Attendance: SIGCSE 2019 - Day 1.5

Back at SIGCSE again, this one the 50th to be held. Much of my time is spent dashing about and renewing friendships. That said, I made it to several sessions. I've included at least one author and linked to their paper.

Starting on day 2, we begin with the Keynote from Mark Guzdial

"The study of computers and all the phenomena associated with them." (Perlis, Newell, and Simon, 1967). The early uses of Computer Science were proposing its inclusion in education to support all of education (1960s). For example, given the equation "x = x0 + v*t + 1/2 a * t^2", we can also teach it as a algorithm / program. The program then shows the causal relation of the components. Benefiting the learning of other fields by integrating computer science.

Do we have computing for all? Most high school students have no access, nor do they even take the classes when they do.

Computing is a 21st century literacy. What is the core literacy that everyone needs? C.f. K-8 Learning Trajectories Derived from Research Literature: Sequence, Repetition, Conditionals. Our goal is not teaching Computer Science, but rather supporting learning.

For example, let's learn about acoustics. Mark explains the straight physics. Then he brings up a program (in a block-based language) that can display the sound reaching the microphone. So the learning came from the program, demonstration, and prediction. Not from writing and understanding the code itself. Taking data and helping build narratives.

We need to build more, try more, and innovate. To meet our mission, "to provide a global forum for educators to discuss research and practice related to the learning and teaching of computing at all levels."

Now for the papers from day 1:

Lisa Yan - The PyramidSnapshot Challenge

The core problem is that we only view student work by the completed snapshots. Extended Eclipse with a plugin to record every compilation, giving 130,000 snapshots from 2600 students. Into those snapshots, they needed to develop an automated approach to classifying the intermediate snapshots. Tried autograders and abstract syntax trees, but those could not capture the full space. But! The output is an image, so why not try using image classification. Of the 138531 snapshots, they generated 27220 images. Lisa then manually labeled 12000 of those images, into 16 labels that are effectively four milestones in development. Then, a neural network classifier classified the images. Plot the milestones using a spectrum of colors (blue being start, red being perfect). Good students quickly reach the complete milestones. Struggling students are often in early debugging stages. Tinkering students (~73 percentile on exams) take a lot of time, but mostly spend it on later milestones. From these, we can review assignments and whether students are in the declared milestones, or if other assignment structure is required.

For the following three papers, I served as the session chair.

Tyler Greer - On the Effects of Active Learning Environments in Computing Education

Replication study on the impact of using an active learning classroom versus traditional room. Using the same instructor to teach the same course, but using different classrooms and lecture styles (traditional versus peer instruction). The most significant factor was the use of active learning versus traditional, with no clear impact from the type of room used.

Yayjin Ham, Brandon Myers - Supporting Guided Inquiry with Cooperative Learning in Computer Organization

Taking a computer organization course with peer instruction and guided inquiry, can the peer instruction be traded for cooperative learning to emphasize further engagement and learning. Exploration of a model (program, documentation), then concept invention (building an understanding), then application (apply the learned concepts to a new problem). Reflect on the learning at the end of each "lecture". In back-to-back semesters, measure the learning gains from this intervention, as well as survey on other secondary items (such as, engagement and peer support). However, the students in the intervention group did worse, most of which is controlled by the prior GPA. And across the other survey points, students in the intervention group rated lower. The materials used are available online.

Aman, et al - POGIL in Computer Science: Faculty Motivation and Challenges

Faculty try implementing POGIL in the classroom. Start with training, then implementing in the classroom, and continued innovation. Faculty want to see more motivation, retaining the material, and staying in the course (as well as in the program). Students have a mismatch between their learning and their perceived learning. There are many challenges and concerns from faculty about the costs of adoption.