Elegant C: 2015

Thursday, December 17, 2015

Teaching Inclusively in Computer Science

When I teach, I want everyone to succeed and master the material, and I think that everyone in the course can. I only have so much time to work with and guide the students through the material, so how should I spend this time? What can I do to maximize student mastery? Are there seemingly neutral actions that might impact some students more than others? For example, before class this fall, I would chat with the students who were there early, sometimes about computer games. Does those conversations create an impression that "successful programmers play computer games"? To these questions, I want to revisit a pair of posts from the past year about better including the students.

The first is a Communications of the ACM post from the beginning of this year. It listed several seemingly neutral decisions that can bias against certain groups. Maintain a tone of voice that suggests every question is valuable and not "I've already explained that so why don't you get it". As long as they are doing their part in trying to learn, then the failure is on me the communicator.

The second is a Mark Guzdial post on Active Learning. The proposition is that using traditional lecture-style advantages the privileged students. And a key thing to remember is that most of us are the privileged, so even though I and others have "succeeded" in that setting, it may have been despite the system and not because of the teaching. Regardless of the instructor, the teaching techniques themselves have biases to different groups. So if we want students to master the material, then perhaps we should teach differently.

Active learning has a growing body of research that shows using these teaching techniques help more students to succeed at mastering a course, especially the less privileged students. Perhaps slightly less material is "covered", but students will learn and retain far more. Isn't that better?

Wednesday, December 16, 2015

PhD Defense - Diagnosing performance limitations in HPC applications

Kenneth Czechowski defended his dissertation work this week.

He is trying to develop a science to the normal art of diagnosing low-level performance issues, such as processing a sorted array and i7 loop performance anomaly. I have much practice with this art, but I would really appreciate having more formalism to these efforts.

One effort is to try identifying the cause of performance issues using the hardware performance counters. These counters are not well documented and so the tools are low-level. Instead, develop a meta tool to intelligently iterate over the counters thereby conducting a hierarchical event-based analysis, starts with 6 performance counters and then iterates on more detailed counters that relate to the performance issue. Trying to diagnose why the core is unable to retire the full bandwidth of 4 micro-ops per cycle.

Even if a tool can provide measurements of specific counters that indicate "bad" behavior, the next problem is that observing certain "bad" behaviors, such as bank conflicts, do not always correlate to performance loss, as the operation must impact the critical path.

The final approach is to take the program and build artificial versions of the hot code, such as removing the memory or compute operations from the loop body. For some applications, several loops account for most of the time. Then the loops can be perturbed in different ways that force certain resources to be exercised further. For example, the registers in each instruction are scrambled so that the dependency graph is changed to either increase or decrease the ILP while the instruction types themselves are unchanged.

Tuesday, December 1, 2015

Course Design Series (Post 3 of N) - Features versus Languages

One problem I had in preparing the curriculum for this fall is how to balance teaching programming languages versus the features that exist in programming languages. Some analogs to my course have generally been surveys of programming languages, for example - Michael Hewner's course. In such a course, study is focused strongly on learning a diverse set of languages, particularly pulling from logic and functional programming paradigms.

In general, the problem is that the each feature needs examples from programming languages to illustrate how it is used; however, many of these languages have never been previously taught to the students. Therefore, I could spend the first month teaching the basics of Ada, Fortran, Pascal, et cetera. But in doing this, essentially the class starts as a language class. I had considered this approach; however, the non-survey courses and the textbooks do not begin with the languages. These examples all begin with the features. Furthermore, I have taken the further approach to avoid focusing on specific syntax of languages and instead devote my time to teaching features and design.

Having then accepted that the textbooks had a purpose and were rational in design, I followed their structure. And in retrospect I can see that having upper-level students provides a base of knowledge about programming languages such that the need to cover specifics is avoided. I have still taken some class periods to delve into specific languages; however, this is the exception rather than a rule. I do note that in the future, I would spend some time teaching / requiring specific languages. Without this requirement, I have been faced with grading a myriad of languages and find myself unable to assign problems / questions based on specific languages.

Tuesday, November 10, 2015

CSE Distinguished Lecture - Professor David Patterson - Instruction Sets want to be Free: The Case for RISC-V

Similar to every other talk on Computer Architecture, first we need to revisit history. Only by knowing from where we came, do we envision where to go.

History of ISA:
IBM/360 was proposed to unify the diverse lines of mainframes. Slowly the ISAs started adding more instructions to support more things (see below).

Intel 8086 was a crash ISA design program to cover for their original ISA design that was delayed. Then IBM wanted to adopt the Motorola 68000, but the chip was late, so the IBM PC used 8088s.

In the 1980s, did a study that found that if the compiled code only used simple instructions, then the programs ran faster than using all of the instructions. Why not design a simple ISA?

RISC (Reduced Instruction Set Computing) was that ISA. The processor is simpler and faster. Secretly, all processors are now RISC (internally), for example, the Intel and AMD processors translate from their x86 ISAs into their internal RISC ISA.

Maybe several simple instructions together could execute together, so the architecture could be simplified further and the compiler can find these instructions rather than spending time and energy when the program is running. This ISA is VLIW (very long instruction word), where many simple instructions are merged into the long instruction.

Open ISA:
Computer Architecture is reaching certain limits such that processor gains will soon come from custom and dedicated logic. IP issues limit the ability to do research on ISAs. We are forced to write simulators that may or may not mimic actual hardware behavior.

Proprietary ISAs are continuing to grow in size, about 2 instructions per month. This provides the marketing reason to purchase the new cores, rather than just the architectural improvements.

Instead, let's develop a modular ISA using many of the ideas from existing designs. For example, atomic instructions for both fetch-and-op, as well as load link / store conditional. Or, compressed instruction format so certain instructions can use a 16-bit format rather than 32-bits (see ARM).

RISC-V has support for the standard open-source software: compilers (gcc, LLVM), Linux, etc. It also provides synthesizable core designs, simulators, etc.

Tuesday, November 3, 2015

PhD Defense Samantha Lo - Design and Evaluation of Virtual Network Migration Mechanisims on Shared Substrate

PhD Candidate Samantha Lo defended her dissertation work today.

Virtual networks (VNs) may require migration due to maintenance, resource balancing, or hardware failures. Migration occurs when the assignment of virtual to physical network resources changes. VN assignment has two aspects: policy of where to assign, and mechanism of how to do so. Similar questions exist for migration or changing the assignment.

This dissertation work will focus on this last piece of the mechanism of how to change the assignment. When migrating virtual network nodes, the policy aspect has identified that a migration should occur and to where it should now be placed.

Chapter 3 explored scheduling algorithms in a simulated environment, where layers 2 and 3 can be changed. The goal is to determine a migration schedule that will minimize the overhead / disruption of the migration and the time required to perform the migration. For example, Local Minimum Cost First (LMCF) selects one node at a time to migrate. In contrast, Maximal Independent Set tries to identify multiple nodes to move at once to reduce the time to migrate.

Chapter 4 explored actual implementation in PlanetLab where there is access to layer 3. Virtual networks are placed within PlanetLab. When the network migrates, it experienced up to 10% packet loss. However, if the gateways for the VNs can be synchronized to migrate closer in time, then the loss is lessened.

Chapter 5 addressed the performance issues raised in the previous work through transport and application layer changes. When a VN migrates, the new location may have different physical characteristics. Analysis of the TCP traffic showed that on migration, the packet transmission rates dropped dramatically as the window size fell. How can this be avoided:

1) Controller notifies the applications to delay packet transmission to avoid packet loss.
2) Gateway pauses and buffers traffic.

Under the latter scheme, the gateway fools the user into thinking that the TCP connection is still working when it is instead being buffered. Furthermore, the network is also using Split TCP, such that each "->" is a separate connection in user->gateway->gateway->user. The Split TCP hides the RTT from the user, which potentially permits the gateway to send data faster on resume.

After the command to pause data transmission is sent, the system must wait a small amount of time before actually migrating the network. Otherwise, there are packets in flight that will be lost as the network migrates. These packets will force TCP to attempt to retransmit using its exponential backoff. This backoff can then delay the resumption of data transmission after migration imposing additional overhead. By placing a delay between pausing and migrating, the network is quiesced and will resume more quickly.

Friday, October 2, 2015

SCS Lecture Series: Solver-Aided Declarative Programming

Intersection of Programming Languages, Databases, and analytics (Machine Learning, etc)

Businesses are big systems that can then be modeled: financial, data, supply chain, consumer demand, etc. Simple models can be analytical formulas or spreadsheets prepared by the domain expert. Or an enterprise system of database + Java layer + .... Consider the thousands or millions of SKUs plus thousands of stores. The goal is that the domain expert can build and maintain the model.

For example, one client had 234 billion weekly forecasts (500k skus x 9000 stores x 52 weeks). Consider the impact of BOGO deals and placing items on endcaps. How much does this improve sales? Accuracy of predictions had lower error than competing forecasts. Resulted in millions of dollars in savings by optimizing inventory. And benefit to the customer by using cloud computing versus requiring the customer to buy the necessary computing resources. Customers trust the cloud more to be secure as companies such as Target, Home Depot, T-Mobile, ... have suffered data breaches. Who do you think has more resources to secure the data: google or Target?

Recall that Excel is the world's most popular IDE with the most popular programming language. Thus the audience for this work are the business people, not the computer scientists. So have a programming language of LogiQL, inheriting from datalog. Language is not Turing-complete, but this provides many valuable analyses that would otherwise be undecidable. Provide vastly more efficient clique finding in graphs.

Integrity constraints give possible states for the data and the relation. Therefore, the database knows what is possible in its facts. Or with weighted constraints, which facts violate the fewest constraints. Include an operator (ESO - existential second order) such that rather than a fact exists. ESO is quantified by a relation existing. For example, ESO can be invoked to find a valid color relation for graph coloring.

Friday, August 28, 2015

Repost: Incentivizing Active Learning in the Computer Science Classroom

Studies have shown that using active learning techniques improve student learning and engagement. Anecdotally, students have brought up these points to me from my use of such techniques. I even published at SIGCSE a study on using active learning, between undergraduate and graduate students. This study brought up an interesting point, that I will return to shortly, that undergraduate students prefer these techniques more than graduate students.

Mark Guzdial, far more senior than me, recently challenged Georgia Tech (where we both are) to incentivize the adoption of active learning. One of his recent blog posts lists the pushback he received, Active Learning in Computer Science. Personally, as someone who cares about the quality of my teaching, I support these efforts although I do not get to vote.

Faculty members at R1 institutions, such as Georgia Tech, primarily spend their time with research; however, they are not research scientists and therefore they are being called upon to teach. And so you would expect that they would do this well. In meeting with faculty candidates, there was one who expressed that the candidate's mission as a faculty member would be to create new superstar researchers. Classes were irrelevant to this candidate as a student, therefore there would be no need to teach well as this highest end (telos) of research justifies the sole focus on students who succeed despite their instruction, just like the candidate did. Mark's blog post suggests that one day Georgia Tech or other institutions may be sued for this sub-par teaching.

What about engagement? I (along with many students and faculty) attended a visiting speaker talk earlier this week and was able to pay attention to the hour long talk even though it was effectively a lecture. And for this audience, it was a good talk. The audience then has the meta-takeaway that lectures can be engaging, after all we paid attention. But we are experts in this subject! Furthermore, for most of us there, this is our subfield of Computer Science. Of course we find it interesting, we have repeatedly chosen to study it.

For us, the material we teach has become self-evidently interesting. I return to the undergraduate and graduate students that I taught. Which group is closer to being experts? Who has more experience learning despite the teaching? Who prefered me to just lecture? And in the end, both groups learned the material better.

Edit: I am by no means condemning all of the teaching at R1's or even Georgia Tech. There are many who teach and work on teaching well. The Dean of the College of Computing has also put some emphasis on this through teaching evaluations. Mark's post was partially noting that teaching evaluations are not enough, we can and should do more.

Monday, August 17, 2015

Course Design Series (Post 2 of N): Choosing a Textbook

Having now read both Programming Language Pragmatics, Third Edition and Concepts of Programming Languages (11th Edition), I have settled on the former as my textbook for the fall. I do not find either book ideally suited, and I wish that the Fourth Edition was being released this summer and not in November, which is why it still hasn't arrived.

For the choice of which textbook to use, it was "Concepts" to lose and having 11 editions, the text should also be better revised. I dislike reading the examples in a book and questioning how the code would compile. Beyond which, the examples felt quaint and contrived.

(edit) For example, I was reviewing the material on F# and copied in an example from the text:
let rec factorial x =
if x <= 1 then 1
else n * factorial(n-1)
Does anyone else notice that the function parameter is x on the first two lines and n on the last?!

Before the Concepts book is written off entirely, there are many valuable aspects. I enjoyed reading about the history of programming languages, especially for exposing me to Plankalkül. The work also took a valuable track in the subject by regularly looking at the trade-offs between different designs and features. This point certainly helped inform my teaching of the material.

Fundamentally, when I looked at the price of the two textbooks, the benefits of using the newer Concepts textbook could not outweigh the nearly doubled pricetag. Most of the positive points are small things and can be covered as addendums to the material.

(FCC note - Concepts of Programming Languages was provided free to me by the publisher.)

Wednesday, July 15, 2015

PhD Defense Yacin Nadji - Understanding DNS-based Criminal Infrastructure for Informing Takedowns

Yacin Nadji, a PhD candidate in security at Georgia Tech, successfully defended his dissertation work today.

How does one disable a botnet? It is difficult to identify and repair individually infected machines. Therefore, targeting the command and control servers can instead break the linkage between the infected machines and the malicious controller.

Manual identification is time-consuming and can lead to collateral damage. Automation is required to enumerate the machines, evaluate the threat, identify the takedown mechanism, and determine the potential collateral damage by the takedown. Using a dataset of DNS registrations over time, the tools were tested across this sample of the Internet over time (from Damballa).

APT (Advance persistent threats) are particularly troublesome as they are machines that persist and change their presence overtime according to the botnet controller. The C&C machines also attempt to go dark by changing their IP resolution to localhost (127.0.0.1), thereby minimizing their observed signature by only having network traffic when an attack is required. This leads to a suite of detection features that can lead to identifying the actual C&C machines, such as having short-lived IP addresses, changing the domain name to NULL or localhost, and varying the IP address across a diverse set of infrastructure and geographic locations.

Then develop a machine learning algorithm, initially with a ground truth of 50k records of APTs. The features are scored and then run through different models using the 90/10 on the ground truth dataset. The following results are only approximate, as I was trying to copy them during the presentation.

Model	Accuracy	True Positive Rate	False Positive Rate
Naive Bayes	70	91	40
General Linear Regression	98	93	1
Random Forest	99	97	0.04

Then apply to the full dataset of 300 million records. These are clustered to 1.1 million clusters, of which ~700 are above 0.8 confidence of being APTs. At 90% confidence, the clusters all contain less than 1000 domain names.

How then do botnets attempt to evade detection? The infected machines generally use DNS to lookup their C&C machines; however, the lookup can be occasionally spurious or to legitimate IPs. The machines could be peer to peer, but this requires active connections that are often blocked or restricted by networks (against "legitimate" uses such as bittorrent).

The suite of tools also operates on the malware running in VMs, whereby it works through possible takedown mechanisms and then observes the response of the infection to takedown thereby identifying other, possibly unused, communication approaches. For most infections, this takes on the order of hours to enumerate through the approaches; however, some can take days.

Open Problems:

Attributing the botnet to physical entities
Targeting P2P-based botnets

Wednesday, June 17, 2015

Conference Attendance FCRC - Day 5 - Plenary Summary

Plenary Talk today, which pulls together all of the conference attendees. Sunday's talk was based in databases, with Michael Stonebraker speaking on his Turing-award winning work. Monday's talk discussed interdisciplinary work, primarily centered in CS theory, and was given by Andrew Yao (a prior Turing Award winner). On Tuesday, Olivier Temam discussed neural networks in hardware, which focused on his work and efforts to better model or mimic the capabilities of the brain.

The F# Path to Relaxation -
There are opportunities to introduce new work toward relaxing and improving. Or perhaps create opposing camps. Thesis <-> Antithesis ==> synthesis. Or Functional <=> Interop. Back in 2003, functional languages were isolated, non-interoperable, using their own VMs. F# (along with Scala, Swift, ...) instead seeks to have an exosystem, being the external industry-standard runtimes. Another tension is between Enterprise and Openness. So F# is open and cross-platform. Tools are available for Android and iOS, as well as packages for Linux.

Functional <=> Objects
Thus embrace objects, without being object-oriented. Some cases in the cross-product of the expected features for objects and functions requires particular care for synthesis.

Circularities and Modularity in the Wild
Lambdas, generics, etc are clearly being embraced in modern language design. However, circular type dependencies are unfortunately also widely present. Languages need to enforce acyclicity.

Pattern Matching <=> Abstraction
How does the language support the functional concept of pattern matching, when you want to include type abstraction? Alas, the speaker skipped the solution quickly.

Code <=> Data
Most development is to providing tools for the information revolution. There is exponential growth in Open APIs for accessing data from the internet. This data then comes with dynamic types, where the types are only known once the data (or schema) has been accessed. The type creation can also enable blending code for other languages into the F# environment. For example, the support can allow opening csv or json files and having types for the data. This feature is, by far, the most exciting and interesting of the presentation. Not quite worth the price of admission, but clearly a great development.

Applied PL design comes from the synthesis at the heart of these contradictions. This tension also is part of the proliferation of languages.

Conference Attendance FCRC - Day 4 - PLDI

PLDI starts off this morning with Concurrency. As a student volunteer, I worked this session and was limited as to what I could note about the content itself.

Composing Concurrency Control - Introducing more diverse and finer-grained locking mechanisms. The tool works to develop a locking strategy that will guarantee serializability, abort-safety, opacity, and deadlock-freedom. It particularly works to integrate both locking schemes as well as transactional memory.

In the afternoon, I can dive into the semantics of the C language.

A Formal C Memory Model Supporting Integer-Pointer Casts - What optimizations are possible in the presence of pointers, pointer arithmetic, and integer-pointer casts? For example, can constants be propagated or is their location potentially targetable by a pointer? Other optimizations are explored in their paper. In practice, as code can generate arbitrary addresses, how can the compiler reason about any specific location in memory.

Defining the Undefinedness of C - Extending their prior work that gave semantics to defined behavior of C programs, which required doubling the rules to describe the semantic behavior. Fundamentally, any instance of undefined behavior that will be definitely encountered in an execution will invalidate that execution. For example, dividing by zero after a printf is valid to crash before the printf. The following code example is also undefined.

return (x = 1) + (x = 2);

Many of these cases are dependent on runtime behavior, and therefore a tool that can help identify them is valuable.

Monday, June 15, 2015

Conference Attendance FCRC - Day 3 - PLDI / ISCA

PLDI itself began this morning and after the welcome, we had three distinguished papers. I am excited that two of these works focused on code performance and compilers, rather than higher-level programming language issues:

Automatically Improving the Accuracy of Floating Point Expressions - How do you address rounding error in your code? Use formal numeric methods an expert can reduce the errors. But rather than be an expert, they wrote a tool to use heuristics to apply these methods. For example, what error do you have when evaluating the quadratic formula. Based on just the value for b, there are different expressions that have much lower error.

The tool, Herbie, estimates the accuracy of the expression and then attempts to use algebraic transformations (from a database of 120 rules). Having generated many candidate expressions, the tool then selects using dynamic programming an appropriate set of expressions across the input space. First, it matches the example cases from the Hamming's Numeric Methods book. And furthermore has found bugs in existing projects.

Diagnosing Type Errors with Class - SHErrLoc works to identify the likely cause of type errors. Expressions are given constraints. These constraints form a graph, which is analyzed for failing paths in the graph. The tool then attempts to localize the failure and identify the minimal change to the constraints to satisfy the graph. Even though it is not Haskell specific, it is more accurate at detecting type errors in Haskell programs than related work.

Provably Correct Peephole Optimizations with Alive - Compilers are buggy. For example, LLVM's InstCombine is an LLVM pass that exploits the LLVM IR to improve performance, which contains many hand-rolled transformations. Propose a DSL that describes Peephole Optimizations, where the the DSL is basically a simplified LLVM IR annotated with preconditions for the transformation. Then the expression describing the transformation is passed through constraint checkers to verify it is correct. And then generate C++ code for that transformation.

Correctness of the expression must not introduce new undefined behaviors, still produces the same result, and properly updates the memory state. Initially proved the optimizations in InstCombine correct or identified bugs, and eventually could replace the pass with the generated version. Furthermore, Alive was able to strengthen the post-conditions for many instructions (for example, identifying whether an operation will overflow).

In the afternoon, I was visiting the other side of the conference center with ISCA. One paper of note there:

A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing - They showed that (somehow, as I missed this point) integrating the simple core closer to the memory system, they pushed past the memory bandwidth of HMC (640GB/s) and instead to about 2.3TB/s. They focused on two pieces: an updated programming model for graphs and a prefetching system within their cores. The model introduced async remote procedure calls that are sent to the Tesseract core near the data. These messages accumulate in a queue until either a barrier or the queue is full. While they accumulate, the prefetcher is requesting the appropriate data so when the function fires, the data is available. The prefetcher is able to operate on the two separate streams: the local processing that is sequential and generating the remote requests, and then the remote requests received at this node.

Sunday, June 14, 2015

Conference Attendance FCRC - Day 1 - WCAE / SPAA

In Portland for the next 5 days attending the Federated Computing Research Conference, which is a vast co-location of the top ACM conferences. For my part, this includes ISCA and PLDI. Following registration and checking in as a student volunteer, I ducked in to the Workshop on Computer Architecture Education (WCAE). There were a couple of presentations on different tools being used to teach architectural concepts.

Following the morning break, it was time for the keynote for SPAA, given by Hans-J Boehm, titled, "Myths and Misconceptions about Threads". For example,

#include "foo"
f() {
foo_t x, a;
...
x = a; // Is this atomic?
}

Which lead to the discussion of 'is assignment atomic?' and the audience tossed out increasing complex examples of how it is not. Fundamentally, the programming model is becoming "data-race free", and the specifications can treat races as "undefined behavior". In general, a sequential program will view its execution following the sequential consistency model, even if the hardware is executing the code with a weaker model.

What then should the programming language provide for the atomics / synchronization? Recall that the compiler has considerable flexibility for emitting the final program. With data-race free code, the compiler is treating anything that is not an atomic as part of sequential code and therefore subject to any reordering that would still be valid sequentially. The following example is how this can go awry. X is a global, and the compiler could substitute x anyplace tmp is, because the model assumes "there are no races on x". And if the program does happen to modify x is a racy manner, then the behavior is undefined.

bool tmp = x;
if (tmp) f = new ...
...
if (tmp) f->foo();

Gah! The programmer wanted to take a snapshot of the global value, but ended up with a different result. So the atomics are becoming more than just the "hacker's" way to quickly update shared values, and instead can be seen as annotations to the compiler to clearly encapsulate the shared state. This means the type of x is not bool, but atomic<bool>. Then the compiler knows the programmer's (likely) intent of this code. And this then rolls back to a deeper question of my research, "What could the system do more efficiently if it knew more about the programmer's intent?"

Wednesday, May 13, 2015

Course Design Series (Post 1 of N): Why Study Programming Languages

Obviously, there are some practitioners and researchers who base their living on programming language design. But what of the rest of us? The Communications of the ACM ran a short article on why: Teach foundational language principles. Language design is increasingly focused on programmer productivity and correctness. As programmers, are we aware of the new features and programming paradigms?

Particularly, let's look at three aspects of programming languages: contracts, functional languages, and type systems. By introducing each to budding programmers, we improve their habits, just as presently I include comments and other style components in project grades. First, contracts provide a well defined mechanism for specifying the requirements of each component in a system. Not just commenting each component and interface, but doing so in a manner that permits the compiler / runtime system to verify and enforce it.

Functional languages are well known, and whether or not you may ever use one, they teach valuable techniques and mental models regarding programming. A programmer should use pure and side-effect free procedures, rather than interleaving logic across different components. Other practices, such as unit testing, also trend toward clean interfaces; however, being forced to obey these rules via the underlying language is great practice.

Type systems are treated by the authors as a panacea, as they call for language designers to be "educated in the formal foundations of safe programming languages - type systems." Panacea or not, reasoning about functionality within the bounds of types leads to code that is clearer and more maintainable. Even in a weak language, such as C, one can use enums rather than everything being "int".

As these aspects are being increasingly used in various forms in current language design, programmers need to be knowledgeable of them in order to be effective and use the languages to their full potential. It is therefore incumbent on me, and other educators, to appropriately include these aspects when we teach about programming languages.

Which leads me back to writing a course description and learning objectives for my fall course. Maybe later this month once I figure out why one of the textbooks still hasn't arrived.

Friday, April 17, 2015

Repost: Code Quality

xkcd had a great comic today about the code quality of self-taught programmers. While there are technically trained programmers that write poor quality code, my impression is that this is more common with self-taught programmers as well as programmers who are only taught programming itself. Basically as part of the technical training, someone learns more than just programming. They learn about CS theory, data structures, multiple programming languages, and are more often exposed to well written / designed programs. Learning to program at a high quality is similar to learning a natural language, in that you study grammar and spelling, you read great works of literature and analyze them, and you practice reading / writing / speaking.

Each month I understand better what it takes to be competent in my field and also respect more the idea that curriculum is established by experts for a purpose, whether or not I may like certain topics.

Monday, April 13, 2015

Book Review: Structured Parallel Programming

At SIGCSE this year, I spoke with several publishers about possible textbooks. Primarily, for ones that would work in the different classes that I might teach. As well as those that relate to my present research. In this case, I received a copy of Structured Parallel Programming: Patterns for Efficient Computation from the publisher to review and consider for possible future use in a course.

This work is in three parts: the basics of parallelism and performance, the common patterns in which parallelism is expressed, and example implementations of several algorithms. The second part is the core of the work. To show maps, reduce, scatter and gather, stencil, fork-join, and pipeline. But before we learned those details, we would come to key quotes for all that I do:

You cannot neglect the performance of serial code, hoping to make up the difference with parallelism.

And:

[The] performance of scalar processing is important; if it is slow it can end up dominating performance.

Therefore, parallelism is a method for improving the performance of already efficient code.

With both the common patterns, as well as the example implementations, the authors generally provide the source code for each pattern and implementation using Cilk, TBB, and OpenMP. This source is not for casual readers. More involved implementations can stretch for several pages, as the initial implementation and then subsequent refinements are explored. While it serves well as a reference, it may have worked better to focus on one parallelism approach for each section and therefore give further explanation to the code, especially the language features used. And thereby retain the pattern itself rather than becoming a practitioners' reference.

The example implementations (the third part) are perhaps the least interesting for the classroom and potentially the most interesting for practitioners. Clearly, if I was trying to write code similar to one of these problems, I would have an excellent reference and starting point. However, that is quite rarely the case for myself and I suspect most people as well.

If I was teach a parallel programming course, I might consider using this work (although I still have other, similar textbooks to review); however, were I to do so I would be confining my teaching to the first two parts and may even to just 1 parallel programming paradigm. Yes, I will admit that the last parallel programming course I took covered a diversity of paradigms (Cilk, vectorization, GPUs, OpenMP, MPI), yet I would have preferred to focus more on what one or two paradigms are capable of rather than just the taste of many. Parallel programming takes a lot of work to learn and this book is one piece in that effort.

Thursday, April 2, 2015

Course Design Series (Post 0 of N): A New Course

This fall I will again be Instructor of Record. My appointment is to teach CS 4392 - Programming Languages. This is an unusual situation in that the course has not been taught for over 5 years (last time was Fall 2009). Effectively, the course will have to be designed afresh.

The first step in course design (following L. Dee Fink and McKeachie's Teaching Tips: Chapter 2) is to write the learning objectives using the course description, along with prerequisites and courses that require this one. Let's review what we have:

Course Description: none

Prerequisites:
Undergraduate Semester level CS 2340 (which has the following description)
Object-oriented programming methods for dealing with large programs. Focus on quality processes, effective debugging techniques, and testing to assure a quality product.

Courses depending on this: none

Alright. Now I will turn to the CS curriculum at Georgia Tech. Georgia Tech uses a concept they call "threads", which are sets of related courses. CS 4392 is specifically in the Systems and Architecture thread. This provides several related courses:

CS 4240 - Compilers, Interpreters, and Program Analyzers
Study of techniques for the design and implementation of compilers, interpreters, and program analyzers, with consideration of the particular characteristics of widely used programming languages.

CS 6241 - Compiler Design
Design and implementation of modern compilers, focusing upon optimization and code generation.

CS 6390 - Programming Languages
Design, structure, and goals of programming languages. Object-oriented, logic, functional, and traditional languages. Semantic models. Parallel programming languages.

Finally, the ACM has provided guidelines for the CS curriculum. Not only does this provide possible options for what material I should include, but they have also provided several ACM exemplar courses (c.f., http://www.cs.rochester.edu/courses/254/fall2013/ and http://courses.cs.washington.edu/courses/cse341/13sp/).

To summarize, if my first step is to write the learning objectives, then I am on step 0: write the course description. In a couple of weeks, I plan on finishing my initial review of potential textbooks as well as the other materials covered above. That will provide me the groundwork for the description and then objectives.

Friday, March 27, 2015

PhD Seminar - teaching, writing and oral presentation skills

Every month, here are Georgia Tech we are getting together to cover some of the things that PhD students should know / learn during their studies. This month we gathered to cover various communication skills. I have copied the points from today's seminar:

Teaching Advice

Strive for engagement
Trust your expertise
Conference Talk != Teaching
Get Students to Talk

Writing Advice

Write for a general audience occasionally
Read to find models of writing
Book Recs: Oxford Guide to Plain English / Bugs in Writing
It is a process
Find out what motivates you to write
Inside-Out style
Pay attention to copyediting and learn

Presentation Advice

Motivate them to want to read the paper
The audience doesn't know what you don't tell them
Toastmasters (or Techmasters at GT)
Slides as a reminder
Eye contact
Repeat the question (everyone has heard it and you know what is being asked)

Tuesday, March 10, 2015

Book Review: Geek Sublime: The Beauty of Code, the Code of Beauty

A book of such promise is Geek Sublime: The Beauty of Code, the Code of Beauty, yet it fails. The first third of this work followed the hopeful theme, intertwining the story of programming, its style, and the author's complex relationship with his Indian heritage, desire to be an artist (writer, etc), and his ability to make money programming. An interesting twining that continued to encourage me to read, yet some worrisome signs developed.

The first worry was skipping a long section on how computers work. This *is* my field of Computer Science, so I wasn't interested in reading the basics. Yet worse was noticing some inaccuracies. They established a certain level of understanding in Computer Science that was concerning, particularly in light of the author's non-technical background. Livable, er readable, sure.

It was then the author's intent to show the great intellectual contributions made by Indians. I have no dispute of this point, except that it wasn't actually fitting with the established theme of the work. Finding that Sanskrit has a codified language of great antiquity enabled the author in this quest. Alas, it was long pages that grew increasingly divorced with the first part of the title, "The Beauty of Code" and further focus on the later, "The Code of Beauty". And this code deriving from Indian literary traditions.

In the end, the book concluded. A crescendo of final claims that each read like next page would be the last, until such time as there was really no text left. I learned something about Indian culture by reading this book, except that was not why I read it. I did not gain in what I sought, so I cannot recommend reading it, nor care to provide a link to it.

Saturday, March 7, 2015

Conference Attendance SIGCSE 2015 - Day 2 / 3

I recognize that Day 1 afternoon went “missing”. I presented my poster and that consumed the sum total of my time. While I am happy with all that I achieved with my poster (writing IRB protocol, independent work, analyzing my teaching, et cetera), it was not considered as a finalist for the student research competition (SRC). Yet I received significant feedback and a number of follow-ons that I will have to try to evaluate the next time(s) I teach. I have been doing an excellent job of networking and speaking with my colleagues. And I have seen several exciting techniques to improve my teaching.

BOF2: Preparing Undergraduates to Make the Most of Attending CS Conferences

This was a small Bird of the Feather (BoF), but we discussed some approaches and things to know before attending your first conference, and not just for undergraduates. However, in some cases, undergraduates are attending their local institution and may never have traveled any significant distance. What do they really need to bring?

In traveling, take some time to prepare students. Let them know what to expect. For example, it is okay to miss some paper sessions, and even return to your room entirely. It is okay to ask questions 1:1. Find groups where people are being introduced and join in. Student volunteering, while takes time, also gives an additional individuals that you will know. Use the people you know to introduce you to others at the conference.

MIPSUnit: A Unit Testing Framework for MIPS Assembly

This is just what it sounds. A Ruby based framework that enables writing simple unit tests that will then be applied to a full simulation of the assembly executed.

A Web-based Simulator for Learning Multiprocessor Real-time Scheduling

The presenter(s) were not at this poster, but it showed a high quality interface for seeing the scheduling of threads according to different scheduling policies. The intent here was not to explore races and parallelism, but rather see how scheduling decisions are made in an OS.

Raspberry HadooPI: A Low-Cost, Hands-On Laboratory in Big Data and Analytics

I was not expecting this poster. You are walking along and then see 4 Raspberry Pi's all networked together. Raspberry Pis and HPC?! A small setup, but it is an interesting development that takes advantage of the low cost Pi and still provide an HPC platform for students.

Building the Pascaline: Digital Computing Like It's 1642

Plastic parts all worked together to form replicas of Pascal's mechanical calculator. Interesting and student assembled.

On the Evaluation of Student Team Software Development Projects

Teams of 4 students, approach is evaluated on courses from three years of major (sophomore on up). Teams are formed with CATME (particularly using dissimilar GPAs in a group), as well as partner selection (when possible). Students provide peer evaluations after each stage of the project. Significant data collection looking particularly at what students prefer for to be the evaluation policy (between 100% of grade for the group’s work to 100% of the grade for the individual’s contribution). This question was taken repeatedly throughout the semester, which leads to whether student preferences change? More senior students prefer more weight being attributed to group. The predictor for what grade split is at what point in the course is this surveyed, and effectively as soon as the teams are formed the students prefer to be graded primarily as a group. Follow on study is looking at experience with team projects, trust in the ability to evaluate individual contribution, and other questions. This is a hopeful data point.

What Influences CS Faculty to Adopt Teaching Practices?

How do faculty become aware and why do they try out teaching practices? 66 participants in CS, including chairs, tenure-track faculty, teaching faculty, and Ph.D. student instructors across 36 institutions. First, the mental model of what an instructor does can differ significantly from what the instructor is actually doing. Second, faculty can find out about practices through a variety of approaches, such as self-identifying that there is possible improvement in their teaching. Faculty often trust other faculty like them (researchers to researches, lecturers to lecturers). Third, when adopting a practice, faculty need to evaluate the effectiveness (see also my poster, student feedback, etc). -- My efforts in this have been having different faculty (my recommendation letter writers) view my lectures / teaching, and thereby giving them demonstrations of different practices.

Personalized Attention @ Scale

"We lost the war on cheating" Instead, we have to meet with students such that they are demonstrating their understanding of the code. The requirements of submissions: attribute your sources and understand your submission. Enables students to work together, use all sources, develop interview skills. Enables reuse of assignments. Grading is now 40% correctness / 60% code interview. Rubric for each interview. Students should arrive early and have their laptop ready to present / explain. Students were better able to learn and complete the assignments, as well as feedback for improvement. Students also felt better able to learn the material by being able to collaborate and not constrained by a collaboration policy. There are some stressors, such as TAs having to meet with hundreds of students, as well as their inconsistencies. -- This was perhaps the most exciting new technique that I saw / heard about.

Thursday, March 5, 2015

Conference Attendance SIGCSE 2015 - Day 1 Morning

It is colder here in Kansas City. Fortunately, I will only be outside briefly. Most often I will be networking and continuing my efforts to both become a better teacher, as well as finding an academic job teaching.

This morning, I am focusing on the "Curriculum" track. I am excited by the three papers in this track, the first looks at research, the second is on systems courses, and the last on parallel computing courses. Alas, I was in the hallway track and missed the first work. Perhaps I can find the authors later.

Backward Design: An Integrated Approach to a Systems Curriculum
The goal of systems is "higher level software creation". Computer Science courses are split into Core Tier 1 and Tier 2 (a term from the ACM 2013 curriculum), where the former are taken by all CS majors and the later are only taken by most or some. One issue in the old curriculum was that OS also taught C. In crafting a new curriculum, first establish a vision statement, which can be used in conflict resolution (and also revised). Establish SMART objectives to prepare and build the assessments. The results can be found on github.

A Module-based Approach to Adopting the 2013 ACM Curricular Recommendations on Parallel Computing
Parallel computing is important and important for CS graduates to know. The 2013 ACM Curriculum increased the number of hours that students should take in parallel computing. Part of the recommendations are to place parallel computing into the curriculum and not just as a course. Thus parallelism modules are placed throughout the curriculum (perhaps as early as CS1 or CS2). Find the level of abstraction for a concept and introduce it appropriately. For example, Amdahl's Law in CS1 versus cache coherence in senior-level class. 5 modules of parallelism were established, which have equivalences with the ACM. Each course in the curriculum may have 1 or more modules, which then teaches and reinforces the topics. Even after adding these modules, there has continued to be incremental development and revisions, which have improved student outcomes. The key take away is that it is possible to introduce these recommendations without completely rewriting the curriculum.

In the afternoon, I will be standing with my poster - Using Active Learning Techniques in Mixed Undergraduate / Graduate Courses. Later I will post updates from my afternoon.

Monday, February 23, 2015

Compilers and Optimizations, should you care?

Compiler optimizations matter. One time in helping a fellow Ph.D. student improve a simulator's performance, I did two things: first, I replaced an expensive data structure with a more efficient one. Second, I turned on compiler optimizations. Together, the simulator ran 100x faster.

A question posted in the stackexchange system asked, "Why are there so few C compilers?" The main answer pointed out that any C compiler needs to be optimizing. Lots of optimizations are occurring on every compilation, and each one gaining tiniest increments in performance. While I enjoy discussing them in detail, I generally wave my hands and tell of how they are good, yet make debugging difficult. These optimizations are lumped together as the "optimization level".

In "What Every Programmer Should Know About Compiler Optimizations", we revisit optimizations. First, the compiler is no panacea and cannot correct for inefficient algorithms or poor data structure choices (although I am party to research on the later). The article then suggests four points to assist the compiler in its efforts at optimizing the code.

"Write understandable, maintainable code." Please do this! Usually, the expensive resource is the programmer. So the first optimization step is to improve the programmer's efficiency with the source code. Remember Review: Performance Anti-patterns and do not start optimizing the code until you know what is slow.

"Use compiler directives." Scary. Excepting the inline directive, I have used these less than a half dozen times in almost as many years of performance work. Furthermore, the example of changing the calling convention is less relevant in 64-bit space where most conventions have been made irrelevant.

"Use compiler-intrinsic functions." (see Compiler Intrinsics the Secret Sauce) This can often dovetail with the first point by removing ugly bit twiddling code and putting in clean function calls.

"Use profile-guided optimization (PGO)." This optimization is based on the dynamic behavior of the program. Meaning that if you take a profile of the program doing X, and later the program does Y; executing Y can be slower. The key is picking good, representative examples of the program's execution.

So you have dialed up the optimization level, and written understandable code sprinkled with intrinsics. Now what? The next step (with which I agree) is to use link time optimizations (LTO) / Link-Time Code Generation (LTCG). This flag delays many optimizations until the entire program is available to be linked. One of the principles of software performance is that the more of the program available to be optimized, the better it can be optimized. (This principle also applies in computer architecture). Thus, by delaying many optimization until the entire program is available, the linker can find additional and better opportunities than were present in individual components.

The article notes, "The only reason not to use LTCG is when you want to distribute the resulting object and library files." And alas, I have fought several battles to overcome this point, as my work requires the use of LTO. Perhaps in the next decade, LTO will be standard.