Elegant C

Wednesday, December 16, 2015

PhD Defense - Diagnosing performance limitations in HPC applications

Kenneth Czechowski defended his dissertation work this week.

He is trying to develop a science to the normal art of diagnosing low-level performance issues, such as processing a sorted array and i7 loop performance anomaly. I have much practice with this art, but I would really appreciate having more formalism to these efforts.

One effort is to try identifying the cause of performance issues using the hardware performance counters. These counters are not well documented and so the tools are low-level. Instead, develop a meta tool to intelligently iterate over the counters thereby conducting a hierarchical event-based analysis, starts with 6 performance counters and then iterates on more detailed counters that relate to the performance issue. Trying to diagnose why the core is unable to retire the full bandwidth of 4 micro-ops per cycle.

Even if a tool can provide measurements of specific counters that indicate "bad" behavior, the next problem is that observing certain "bad" behaviors, such as bank conflicts, do not always correlate to performance loss, as the operation must impact the critical path.

The final approach is to take the program and build artificial versions of the hot code, such as removing the memory or compute operations from the loop body. For some applications, several loops account for most of the time. Then the loops can be perturbed in different ways that force certain resources to be exercised further. For example, the registers in each instruction are scrambled so that the dependency graph is changed to either increase or decrease the ILP while the instruction types themselves are unchanged.

Tuesday, December 1, 2015

Course Design Series (Post 3 of N) - Features versus Languages

One problem I had in preparing the curriculum for this fall is how to balance teaching programming languages versus the features that exist in programming languages. Some analogs to my course have generally been surveys of programming languages, for example - Michael Hewner's course. In such a course, study is focused strongly on learning a diverse set of languages, particularly pulling from logic and functional programming paradigms.

In general, the problem is that the each feature needs examples from programming languages to illustrate how it is used; however, many of these languages have never been previously taught to the students. Therefore, I could spend the first month teaching the basics of Ada, Fortran, Pascal, et cetera. But in doing this, essentially the class starts as a language class. I had considered this approach; however, the non-survey courses and the textbooks do not begin with the languages. These examples all begin with the features. Furthermore, I have taken the further approach to avoid focusing on specific syntax of languages and instead devote my time to teaching features and design.

Having then accepted that the textbooks had a purpose and were rational in design, I followed their structure. And in retrospect I can see that having upper-level students provides a base of knowledge about programming languages such that the need to cover specifics is avoided. I have still taken some class periods to delve into specific languages; however, this is the exception rather than a rule. I do note that in the future, I would spend some time teaching / requiring specific languages. Without this requirement, I have been faced with grading a myriad of languages and find myself unable to assign problems / questions based on specific languages.

Tuesday, November 10, 2015

CSE Distinguished Lecture - Professor David Patterson - Instruction Sets want to be Free: The Case for RISC-V

Similar to every other talk on Computer Architecture, first we need to revisit history. Only by knowing from where we came, do we envision where to go.

History of ISA:
IBM/360 was proposed to unify the diverse lines of mainframes. Slowly the ISAs started adding more instructions to support more things (see below).

Intel 8086 was a crash ISA design program to cover for their original ISA design that was delayed. Then IBM wanted to adopt the Motorola 68000, but the chip was late, so the IBM PC used 8088s.

In the 1980s, did a study that found that if the compiled code only used simple instructions, then the programs ran faster than using all of the instructions. Why not design a simple ISA?

RISC (Reduced Instruction Set Computing) was that ISA. The processor is simpler and faster. Secretly, all processors are now RISC (internally), for example, the Intel and AMD processors translate from their x86 ISAs into their internal RISC ISA.

Maybe several simple instructions together could execute together, so the architecture could be simplified further and the compiler can find these instructions rather than spending time and energy when the program is running. This ISA is VLIW (very long instruction word), where many simple instructions are merged into the long instruction.

Open ISA:
Computer Architecture is reaching certain limits such that processor gains will soon come from custom and dedicated logic. IP issues limit the ability to do research on ISAs. We are forced to write simulators that may or may not mimic actual hardware behavior.

Proprietary ISAs are continuing to grow in size, about 2 instructions per month. This provides the marketing reason to purchase the new cores, rather than just the architectural improvements.

Instead, let's develop a modular ISA using many of the ideas from existing designs. For example, atomic instructions for both fetch-and-op, as well as load link / store conditional. Or, compressed instruction format so certain instructions can use a 16-bit format rather than 32-bits (see ARM).

RISC-V has support for the standard open-source software: compilers (gcc, LLVM), Linux, etc. It also provides synthesizable core designs, simulators, etc.

Tuesday, November 3, 2015

PhD Defense Samantha Lo - Design and Evaluation of Virtual Network Migration Mechanisims on Shared Substrate

PhD Candidate Samantha Lo defended her dissertation work today.

Virtual networks (VNs) may require migration due to maintenance, resource balancing, or hardware failures. Migration occurs when the assignment of virtual to physical network resources changes. VN assignment has two aspects: policy of where to assign, and mechanism of how to do so. Similar questions exist for migration or changing the assignment.

This dissertation work will focus on this last piece of the mechanism of how to change the assignment. When migrating virtual network nodes, the policy aspect has identified that a migration should occur and to where it should now be placed.

Chapter 3 explored scheduling algorithms in a simulated environment, where layers 2 and 3 can be changed. The goal is to determine a migration schedule that will minimize the overhead / disruption of the migration and the time required to perform the migration. For example, Local Minimum Cost First (LMCF) selects one node at a time to migrate. In contrast, Maximal Independent Set tries to identify multiple nodes to move at once to reduce the time to migrate.

Chapter 4 explored actual implementation in PlanetLab where there is access to layer 3. Virtual networks are placed within PlanetLab. When the network migrates, it experienced up to 10% packet loss. However, if the gateways for the VNs can be synchronized to migrate closer in time, then the loss is lessened.

Chapter 5 addressed the performance issues raised in the previous work through transport and application layer changes. When a VN migrates, the new location may have different physical characteristics. Analysis of the TCP traffic showed that on migration, the packet transmission rates dropped dramatically as the window size fell. How can this be avoided:

1) Controller notifies the applications to delay packet transmission to avoid packet loss.
2) Gateway pauses and buffers traffic.

Under the latter scheme, the gateway fools the user into thinking that the TCP connection is still working when it is instead being buffered. Furthermore, the network is also using Split TCP, such that each "->" is a separate connection in user->gateway->gateway->user. The Split TCP hides the RTT from the user, which potentially permits the gateway to send data faster on resume.

After the command to pause data transmission is sent, the system must wait a small amount of time before actually migrating the network. Otherwise, there are packets in flight that will be lost as the network migrates. These packets will force TCP to attempt to retransmit using its exponential backoff. This backoff can then delay the resumption of data transmission after migration imposing additional overhead. By placing a delay between pausing and migrating, the network is quiesced and will resume more quickly.

Friday, October 2, 2015

SCS Lecture Series: Solver-Aided Declarative Programming

Intersection of Programming Languages, Databases, and analytics (Machine Learning, etc)

Businesses are big systems that can then be modeled: financial, data, supply chain, consumer demand, etc. Simple models can be analytical formulas or spreadsheets prepared by the domain expert. Or an enterprise system of database + Java layer + .... Consider the thousands or millions of SKUs plus thousands of stores. The goal is that the domain expert can build and maintain the model.

For example, one client had 234 billion weekly forecasts (500k skus x 9000 stores x 52 weeks). Consider the impact of BOGO deals and placing items on endcaps. How much does this improve sales? Accuracy of predictions had lower error than competing forecasts. Resulted in millions of dollars in savings by optimizing inventory. And benefit to the customer by using cloud computing versus requiring the customer to buy the necessary computing resources. Customers trust the cloud more to be secure as companies such as Target, Home Depot, T-Mobile, ... have suffered data breaches. Who do you think has more resources to secure the data: google or Target?

Recall that Excel is the world's most popular IDE with the most popular programming language. Thus the audience for this work are the business people, not the computer scientists. So have a programming language of LogiQL, inheriting from datalog. Language is not Turing-complete, but this provides many valuable analyses that would otherwise be undecidable. Provide vastly more efficient clique finding in graphs.

Integrity constraints give possible states for the data and the relation. Therefore, the database knows what is possible in its facts. Or with weighted constraints, which facts violate the fewest constraints. Include an operator (ESO - existential second order) such that rather than a fact exists. ESO is quantified by a relation existing. For example, ESO can be invoked to find a valid color relation for graph coloring.

Friday, August 28, 2015

Repost: Incentivizing Active Learning in the Computer Science Classroom

Studies have shown that using active learning techniques improve student learning and engagement. Anecdotally, students have brought up these points to me from my use of such techniques. I even published at SIGCSE a study on using active learning, between undergraduate and graduate students. This study brought up an interesting point, that I will return to shortly, that undergraduate students prefer these techniques more than graduate students.

Mark Guzdial, far more senior than me, recently challenged Georgia Tech (where we both are) to incentivize the adoption of active learning. One of his recent blog posts lists the pushback he received, Active Learning in Computer Science. Personally, as someone who cares about the quality of my teaching, I support these efforts although I do not get to vote.

Faculty members at R1 institutions, such as Georgia Tech, primarily spend their time with research; however, they are not research scientists and therefore they are being called upon to teach. And so you would expect that they would do this well. In meeting with faculty candidates, there was one who expressed that the candidate's mission as a faculty member would be to create new superstar researchers. Classes were irrelevant to this candidate as a student, therefore there would be no need to teach well as this highest end (telos) of research justifies the sole focus on students who succeed despite their instruction, just like the candidate did. Mark's blog post suggests that one day Georgia Tech or other institutions may be sued for this sub-par teaching.

What about engagement? I (along with many students and faculty) attended a visiting speaker talk earlier this week and was able to pay attention to the hour long talk even though it was effectively a lecture. And for this audience, it was a good talk. The audience then has the meta-takeaway that lectures can be engaging, after all we paid attention. But we are experts in this subject! Furthermore, for most of us there, this is our subfield of Computer Science. Of course we find it interesting, we have repeatedly chosen to study it.

For us, the material we teach has become self-evidently interesting. I return to the undergraduate and graduate students that I taught. Which group is closer to being experts? Who has more experience learning despite the teaching? Who prefered me to just lecture? And in the end, both groups learned the material better.

Edit: I am by no means condemning all of the teaching at R1's or even Georgia Tech. There are many who teach and work on teaching well. The Dean of the College of Computing has also put some emphasis on this through teaching evaluations. Mark's post was partially noting that teaching evaluations are not enough, we can and should do more.

Monday, August 17, 2015

Course Design Series (Post 2 of N): Choosing a Textbook

Having now read both Programming Language Pragmatics, Third Edition and Concepts of Programming Languages (11th Edition), I have settled on the former as my textbook for the fall. I do not find either book ideally suited, and I wish that the Fourth Edition was being released this summer and not in November, which is why it still hasn't arrived.

For the choice of which textbook to use, it was "Concepts" to lose and having 11 editions, the text should also be better revised. I dislike reading the examples in a book and questioning how the code would compile. Beyond which, the examples felt quaint and contrived.

(edit) For example, I was reviewing the material on F# and copied in an example from the text:
let rec factorial x =
if x <= 1 then 1
else n * factorial(n-1)
Does anyone else notice that the function parameter is x on the first two lines and n on the last?!

Before the Concepts book is written off entirely, there are many valuable aspects. I enjoyed reading about the history of programming languages, especially for exposing me to Plankalkül. The work also took a valuable track in the subject by regularly looking at the trade-offs between different designs and features. This point certainly helped inform my teaching of the material.

Fundamentally, when I looked at the price of the two textbooks, the benefits of using the newer Concepts textbook could not outweigh the nearly doubled pricetag. Most of the positive points are small things and can be covered as addendums to the material.

(FCC note - Concepts of Programming Languages was provided free to me by the publisher.)