Wednesday, April 3, 2013

Architecture Abstraction Leaks

As a computer architect, I like to think that the hardware abstraction is generally opaque.  If a programmer really needs performance or certain effects, then it is possible to reach through the layer, but in general the computer gives you what you want.  Sometimes, a programmer can do simple things that pull the covers back and reveal the architecture in all of its gritty detail.  One example would be following row versus column indexing when looping over large pieces of data.

In this post, I wanted to draw your attention to a question posed on stackoverflow, Why is processing a sorted array faster than an unsorted array?  In this question, we are ignoring the time to sort the array.  The operation being applied to each array element is relatively simple and is conditional on the value of the element.  By sorting the array, the same operations could be applied to contiguous elements.  And as the same operations are being applied, the code in question was having a clear win from branch prediction.

But the branch predictor is abstracted hardware, so once again the programmer just learned about the architecture when he / she didn't intend to.

Wednesday, March 27, 2013

Transactional Memory and Intel's TSX

It was some years ago that I sat in the audience and heard AMD present a sketch for how transactional memory (TM) would be implemented in the x64 ISA.  More recently a fellow student mentioned that Intel has some new extensions entering the x64 ISA for locks, etc.  I'm always a fan of properly implemented locks, as they so often limit performance and scalability.  So let's dig into Intel's TSX and why I actually want to go buy a gadget when it's released.

A programmer can delineate the transactional section with XBEGIN and XEND instructions.  Within the transactional section, all reads and writes are added to a read- or a write-set accordingly.  The granularity for tracking is a cache line.  If another processor makes a read request to a line in the write-set or either request to a read-set, then the transaction aborts.

Transactions can be semi-nested.  A transaction can only commit if the outer transaction is complete.  Internally nested transactions do not commit on XEND.  If any transaction in the nest aborts, then the entire transaction aborts.  If |XBEGIN| equals |XEND|, then the entire transaction commits and becomes globally visible.  Transactions can be explicitly aborted by the XABORT instruction, which enables the code to abort early when it can determine that the transaction will or should fail.

As I understand it, TSX is being built on top of the existing cache coherence mechanisms.  Each cache line gains an additional bit to indicate if it is part of a transaction.  Each memory operation is treated normally between the processor and the coherence hierarchy with several caveats.  If a dirty, transactional block is evicted, then the transaction fails.  If a dirty, transactional block is demand downgraded from modified to shared or invalid, then the transaction fails.  In this case, a new message would indicate that the request to forward the data fails and the request should be satisfied by memory.

If the transaction commits, then the transactional bits are cleared on each cache line.  And the lines operate normally according to existing cache coherence mechanisms.

Wrapping this up, TSX is an ISA extension that almost every program can take advantage of and therefore has an appeal toward conducting personal testing, just like building my own x64 machine back in 2005.

Monday, March 4, 2013

Maps and Red-Black Trees

When cooperating on a prior work of research, Brainy: effective selection of data structures, I learned about the cost of selecting particular data structures.  I expect that every Computer Scientist knows the general cost of using standard data structures like trees and linked lists.  Now, the extension that Brainy provided was to recognize that a "tree" could have different implementations and these implementations can have different costs for a given workload.  I, and I expect others, learned about AVL, splay, red-black and other tree types.

All this is good until your type hides its implementation and you reason about it in abstract.  A map is commonly misinterpreted by its implementation.  A map is a key-value store, where a "key" has an associated "value".  An address book can be understood as a map of name to address / phone number / etc.

Now, the map is assumed to be implemented as a hash table, to give O(1) look-up cost.  However in the C++ standard library, map uses a red-black tree for O(log n).  Before you go and request that the libraries be changed, understand that experimental measurements suggest that the red-black implementation wins out when n > 500, as the hash collisions dominate the idealized O(1) cost.  This is the problem that Brainy attempts to solve: the programmer selects the functionality, e.g., map and Brainy selects the best implementation given your program and architecture.

So go ahead and use a map, but recognize that you may not have O(1) cost and definitely won't as n grows increasingly large.

Monday, January 28, 2013

Repost: When Beauty is Not Truth

While it was not a field discussed in the article, When Beauty Is Not Truth, nonetheless I wonder about the focus I put on elegance in programming (starting with the name of the blog).  So let's consider a couple of quick things about Computer Science and the beauty of the code.  I should add that the article discusses a rough equivalence between beauty, elegance, and simplicity.

First, beautiful code is not always correct.

Second, beautiful code, by virtue of its simplicity, is less likely to have bugs.

Third, beautiful code is more readable, which facilitates comprehension.

(Now back to preparing the lecture for today's class)

Tuesday, December 11, 2012

Repost: A New View of Exceptions

Sometimes you read or hear a claim and think, "That's totally wr... right."  Oh, it does work that way.  Exceptions are Shared Secrets is of this vein.  Exceptions are a secret, side-channel between the raiser and the proper handler.  No other component should be aware of the exception.  Right?

Friday, November 2, 2012

Computer Science Education: Student Conceptions

I'm sitting in Michael Hewner's thesis defense on "Student Conceptions About the Field of Computer Science". And I think there are some interesting points raised that are worth reflecting on. Let's start with summary questions.

Who would be considered a Computer Scientist?

How does a student choose his or her particular focus in Computer Science?

Do misconceptions about Computer Science affect the student's education?

Students have three conceptions about what Computer Science is, which were first derived from interviews and reaffirmed through a 100 student survey. The Theory-view (8%) is that Computer Science is mostly concerned with a theoretical view of computers, where the mathematical basis and understanding is dominant (although there may then exist a related field of Software Engineering). The Programming-view (41%) is that all Computer Science is about programs, where one is either analyzing the basis of or the direct work on programming. The Broad-view (27%) is that Computer Science is a giant umbrella of disciplines where computers are involved. A final 23% of the survey responses were without clear category, which may be due to the limitation of the original interviews. All conceptions view algorithms and data structures as a vital component to Computer Science.

Students use enjoyment of classes as the dominant metric of whether they have an affinity for the area. Ironically, one of the two dominant courses in my undergraduate education (15-213) was offered at 9am, which would commonly be viewed as an unpleasant time. Since students use enjoyment as their metric, students are not particularly affected by their misconceptions about what the course(s) contains.

Clearly then, the enjoyability of a course can then affect the taking of subsequent courses. Which in future work, there may be an exploration of whether course enjoyment effects (like scheduled time) has an effect on enrollment in follow-on courses in subsequent semesters. Furthermore, many students trust the curriculum as providing an adequate preparation for practicing Computer Science, which is to say that they are prepared as long as they satisfy the requirements regardless of any attempt to have a focus in their course selection. Should the curriculum then have unpleasant courses to force students into specializing?

For myself, I am a holder of the Programming-view (perhaps based on being a paid programmer), as I view Computer Science to be centered on programs and the act of programming. The field is directed toward understanding programs and how to program well. Computer Science is informed in part through Mathematics by providing a basis for understanding of algorithms and data structures, of which programs are fundamentally composed. Many fields, like Bio-Informatics, are related and rely on Computer Science, but are not Computer Science.

Wednesday, September 26, 2012

Book Review: Coders at Work

I thought Coders at Work: Reflections on the Craft of Programming was a much more enjoyable read than the previously reviewed, Masterminds of Programming. Primarily as there wasn't a focus on finding conflict, I could instead enjoy hearing about how each person came to learn programming / computer science and their experiences with working on projects both small and large. By the end of the book, I wanted to go write code on a project, which suggests it was far more inspirational than most books I read.

Notable quotes follow:

"As they say, it's easier to optimize correct code than to correct optimized code." - Joshua Bloch

Several interviewees quote Tony Hoare who said "There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult."

"Types are essentially assertions about a program." - Dan Ingalls

"In those days at Bell Labs the amenities were great - you could call a telephone number and get a transcription. You know, leave a message and say you want it written up and it'll appear the next day in your inbox as sheets of paper. And [my coworker], after we talked about the file system on the blackboard for a little while, picked up the phone, called a number, and read the blackboard into the phone. It came back and it was about as close to a design document as we got except that it had homonyms that you wouldn't believe." - Ken Thompson, on the project that became Unix

"[Code is beautiful that has] a simple straightforward solution to a problem; that has some intrinsic structure and obviousness about it that isn't obvious from the problem itself." - Fran Allen

"When I say, 'You don't get credit because the program works. We're going to the next level. Working programs are a given,' they say, 'Oh.'" - Bernie Cosell