Elegant C

Wednesday, March 27, 2013

Transactional Memory and Intel's TSX

It was some years ago that I sat in the audience and heard AMD present a sketch for how transactional memory (TM) would be implemented in the x64 ISA. More recently a fellow student mentioned that Intel has some new extensions entering the x64 ISA for locks, etc. I'm always a fan of properly implemented locks, as they so often limit performance and scalability. So let's dig into Intel's TSX and why I actually want to go buy a gadget when it's released.

A programmer can delineate the transactional section with XBEGIN and XEND instructions. Within the transactional section, all reads and writes are added to a read- or a write-set accordingly. The granularity for tracking is a cache line. If another processor makes a read request to a line in the write-set or either request to a read-set, then the transaction aborts.

Transactions can be semi-nested. A transaction can only commit if the outer transaction is complete. Internally nested transactions do not commit on XEND. If any transaction in the nest aborts, then the entire transaction aborts. If |XBEGIN| equals |XEND|, then the entire transaction commits and becomes globally visible. Transactions can be explicitly aborted by the XABORT instruction, which enables the code to abort early when it can determine that the transaction will or should fail.

As I understand it, TSX is being built on top of the existing cache coherence mechanisms. Each cache line gains an additional bit to indicate if it is part of a transaction. Each memory operation is treated normally between the processor and the coherence hierarchy with several caveats. If a dirty, transactional block is evicted, then the transaction fails. If a dirty, transactional block is demand downgraded from modified to shared or invalid, then the transaction fails. In this case, a new message would indicate that the request to forward the data fails and the request should be satisfied by memory.

If the transaction commits, then the transactional bits are cleared on each cache line. And the lines operate normally according to existing cache coherence mechanisms.

Wrapping this up, TSX is an ISA extension that almost every program can take advantage of and therefore has an appeal toward conducting personal testing, just like building my own x64 machine back in 2005.

Monday, March 4, 2013

Maps and Red-Black Trees

When cooperating on a prior work of research, Brainy: effective selection of data structures, I learned about the cost of selecting particular data structures. I expect that every Computer Scientist knows the general cost of using standard data structures like trees and linked lists. Now, the extension that Brainy provided was to recognize that a "tree" could have different implementations and these implementations can have different costs for a given workload. I, and I expect others, learned about AVL, splay, red-black and other tree types.

All this is good until your type hides its implementation and you reason about it in abstract. A map is commonly misinterpreted by its implementation. A map is a key-value store, where a "key" has an associated "value". An address book can be understood as a map of name to address / phone number / etc.

Now, the map is assumed to be implemented as a hash table, to give O(1) look-up cost. However in the C++ standard library, map uses a red-black tree for O(log n). Before you go and request that the libraries be changed, understand that experimental measurements suggest that the red-black implementation wins out when n > 500, as the hash collisions dominate the idealized O(1) cost. This is the problem that Brainy attempts to solve: the programmer selects the functionality, e.g., map and Brainy selects the best implementation given your program and architecture.

So go ahead and use a map, but recognize that you may not have O(1) cost and definitely won't as n grows increasingly large.

Monday, January 28, 2013

Repost: When Beauty is Not Truth

While it was not a field discussed in the article, When Beauty Is Not Truth, nonetheless I wonder about the focus I put on elegance in programming (starting with the name of the blog). So let's consider a couple of quick things about Computer Science and the beauty of the code. I should add that the article discusses a rough equivalence between beauty, elegance, and simplicity.

First, beautiful code is not always correct.

Second, beautiful code, by virtue of its simplicity, is less likely to have bugs.

Third, beautiful code is more readable, which facilitates comprehension.

(Now back to preparing the lecture for today's class)

Tuesday, December 11, 2012

Repost: A New View of Exceptions

Sometimes you read or hear a claim and think, "That's totally wr... right." Oh, it does work that way. Exceptions are Shared Secrets is of this vein. Exceptions are a secret, side-channel between the raiser and the proper handler. No other component should be aware of the exception. Right?

Friday, November 2, 2012

Computer Science Education: Student Conceptions

I'm sitting in Michael Hewner's thesis defense on "Student Conceptions About the Field of Computer Science". And I think there are some interesting points raised that are worth reflecting on. Let's start with summary questions.

Who would be considered a Computer Scientist?

How does a student choose his or her particular focus in Computer Science?

Do misconceptions about Computer Science affect the student's education?

Students have three conceptions about what Computer Science is, which were first derived from interviews and reaffirmed through a 100 student survey. The Theory-view (8%) is that Computer Science is mostly concerned with a theoretical view of computers, where the mathematical basis and understanding is dominant (although there may then exist a related field of Software Engineering). The Programming-view (41%) is that all Computer Science is about programs, where one is either analyzing the basis of or the direct work on programming. The Broad-view (27%) is that Computer Science is a giant umbrella of disciplines where computers are involved. A final 23% of the survey responses were without clear category, which may be due to the limitation of the original interviews. All conceptions view algorithms and data structures as a vital component to Computer Science.

Students use enjoyment of classes as the dominant metric of whether they have an affinity for the area. Ironically, one of the two dominant courses in my undergraduate education (15-213) was offered at 9am, which would commonly be viewed as an unpleasant time. Since students use enjoyment as their metric, students are not particularly affected by their misconceptions about what the course(s) contains.

Clearly then, the enjoyability of a course can then affect the taking of subsequent courses. Which in future work, there may be an exploration of whether course enjoyment effects (like scheduled time) has an effect on enrollment in follow-on courses in subsequent semesters. Furthermore, many students trust the curriculum as providing an adequate preparation for practicing Computer Science, which is to say that they are prepared as long as they satisfy the requirements regardless of any attempt to have a focus in their course selection. Should the curriculum then have unpleasant courses to force students into specializing?

For myself, I am a holder of the Programming-view (perhaps based on being a paid programmer), as I view Computer Science to be centered on programs and the act of programming. The field is directed toward understanding programs and how to program well. Computer Science is informed in part through Mathematics by providing a basis for understanding of algorithms and data structures, of which programs are fundamentally composed. Many fields, like Bio-Informatics, are related and rely on Computer Science, but are not Computer Science.

Wednesday, September 26, 2012

Book Review: Coders at Work

I thought Coders at Work: Reflections on the Craft of Programming was a much more enjoyable read than the previously reviewed, Masterminds of Programming. Primarily as there wasn't a focus on finding conflict, I could instead enjoy hearing about how each person came to learn programming / computer science and their experiences with working on projects both small and large. By the end of the book, I wanted to go write code on a project, which suggests it was far more inspirational than most books I read.

Notable quotes follow:

"As they say, it's easier to optimize correct code than to correct optimized code." - Joshua Bloch

Several interviewees quote Tony Hoare who said "There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult."

"Types are essentially assertions about a program." - Dan Ingalls

"In those days at Bell Labs the amenities were great - you could call a telephone number and get a transcription. You know, leave a message and say you want it written up and it'll appear the next day in your inbox as sheets of paper. And [my coworker], after we talked about the file system on the blackboard for a little while, picked up the phone, called a number, and read the blackboard into the phone. It came back and it was about as close to a design document as we got except that it had homonyms that you wouldn't believe." - Ken Thompson, on the project that became Unix

"[Code is beautiful that has] a simple straightforward solution to a problem; that has some intrinsic structure and obviousness about it that isn't obvious from the problem itself." - Fran Allen

"When I say, 'You don't get credit because the program works. We're going to the next level. Working programs are a given,' they say, 'Oh.'" - Bernie Cosell

Wednesday, August 29, 2012

Computer Science Education (part 0 of N)

Besides my usual semesters of computer science courses and research, this fall I'm cross-enrolled at a neighboring university that offers education classes. Last night had some very interesting conversations. We are starting to prepare syllabi for the course we'd either ideally teach or want to be prepared to teach. I was in a group with two education PhDs (most of the class are). They consented to consider an introductory computer science course and answer two questions.

What are common naive theories that students have entering the course?
How might the course be designed to encourage students to reconstruct their theories?

So what naive theories do students have?

First, computers are magical. No, computers do exactly what a programmer tells them to do. (More advanced students learn about race conditions, compiler influence on correctness, etc). Which, unfortunately, means that if a computer is not doing what you want it to do, then you instructed it incorrectly (c.f., The rat is always right).

Second, I'm going to be a game programmer. No, most computer scientists do not write games (or at least, aren't paid to). But we find many other interesting parts to the field. Besides, many game programmers are treated little better than grad students.

Do you know other naive theories?

Then after class, I spent some time discussing more "advanced" theories in computer science.

Functional versus imperative programming. Does one paradigm exist to rule them all? Is one class of programming languages sufficient? Do students gain by learning about both paradigms? I discussed this briefly in Is versus ought, and have been regularly reading a strong function view in Existential Type.

Big 'O' notation and algorithm / data structure selection. I previously discussed this some in Know your N. And was co-author on a paper, "Brainy: effective selection of data structures", that demonstrated actual data structure selection for a program is not always best from the "Big 'O'" point of view.

Language equivalence. Related to functional versus imperative and one of my first posts, "Problem Solving via Programming", programming languages are theoretically equivalent (i.e., turning complete). But in practice languages should be selected for particular problems. What problems are best for specific languages?

What are some other major theories about computer science that students should know?