A discussion of how to do Computer Science well, particularly writing code and architecting program solutions.
Wednesday, March 27, 2013
Transactional Memory and Intel's TSX
A programmer can delineate the transactional section with XBEGIN and XEND instructions. Within the transactional section, all reads and writes are added to a read- or a write-set accordingly. The granularity for tracking is a cache line. If another processor makes a read request to a line in the write-set or either request to a read-set, then the transaction aborts.
Transactions can be semi-nested. A transaction can only commit if the outer transaction is complete. Internally nested transactions do not commit on XEND. If any transaction in the nest aborts, then the entire transaction aborts. If |XBEGIN| equals |XEND|, then the entire transaction commits and becomes globally visible. Transactions can be explicitly aborted by the XABORT instruction, which enables the code to abort early when it can determine that the transaction will or should fail.
As I understand it, TSX is being built on top of the existing cache coherence mechanisms. Each cache line gains an additional bit to indicate if it is part of a transaction. Each memory operation is treated normally between the processor and the coherence hierarchy with several caveats. If a dirty, transactional block is evicted, then the transaction fails. If a dirty, transactional block is demand downgraded from modified to shared or invalid, then the transaction fails. In this case, a new message would indicate that the request to forward the data fails and the request should be satisfied by memory.
If the transaction commits, then the transactional bits are cleared on each cache line. And the lines operate normally according to existing cache coherence mechanisms.
Wrapping this up, TSX is an ISA extension that almost every program can take advantage of and therefore has an appeal toward conducting personal testing, just like building my own x64 machine back in 2005.
Monday, March 4, 2013
Maps and Red-Black Trees
All this is good until your type hides its implementation and you reason about it in abstract. A map is commonly misinterpreted by its implementation. A map is a key-value store, where a "key" has an associated "value". An address book can be understood as a map of name to address / phone number / etc.
Now, the map is assumed to be implemented as a hash table, to give O(1) look-up cost. However in the C++ standard library, map uses a red-black tree for O(log n). Before you go and request that the libraries be changed, understand that experimental measurements suggest that the red-black implementation wins out when n > 500, as the hash collisions dominate the idealized O(1) cost. This is the problem that Brainy attempts to solve: the programmer selects the functionality, e.g., map and Brainy selects the best implementation given your program and architecture.
So go ahead and use a map, but recognize that you may not have O(1) cost and definitely won't as n grows increasingly large.
Monday, January 28, 2013
Repost: When Beauty is Not Truth
First, beautiful code is not always correct.
Second, beautiful code, by virtue of its simplicity, is less likely to have bugs.
Third, beautiful code is more readable, which facilitates comprehension.
(Now back to preparing the lecture for today's class)
Tuesday, December 11, 2012
Repost: A New View of Exceptions
Friday, November 2, 2012
Computer Science Education: Student Conceptions
Who would be considered a Computer Scientist?
How does a student choose his or her particular focus in Computer Science?
Do misconceptions about Computer Science affect the student's education?
Students have three conceptions about what Computer Science is, which were first derived from interviews and reaffirmed through a 100 student survey. The Theory-view (8%) is that Computer Science is mostly concerned with a theoretical view of computers, where the mathematical basis and understanding is dominant (although there may then exist a related field of Software Engineering). The Programming-view (41%) is that all Computer Science is about programs, where one is either analyzing the basis of or the direct work on programming. The Broad-view (27%) is that Computer Science is a giant umbrella of disciplines where computers are involved. A final 23% of the survey responses were without clear category, which may be due to the limitation of the original interviews. All conceptions view algorithms and data structures as a vital component to Computer Science.
Students use enjoyment of classes as the dominant metric of whether they have an affinity for the area. Ironically, one of the two dominant courses in my undergraduate education (15-213) was offered at 9am, which would commonly be viewed as an unpleasant time. Since students use enjoyment as their metric, students are not particularly affected by their misconceptions about what the course(s) contains.
Clearly then, the enjoyability of a course can then affect the taking of subsequent courses. Which in future work, there may be an exploration of whether course enjoyment effects (like scheduled time) has an effect on enrollment in follow-on courses in subsequent semesters. Furthermore, many students trust the curriculum as providing an adequate preparation for practicing Computer Science, which is to say that they are prepared as long as they satisfy the requirements regardless of any attempt to have a focus in their course selection. Should the curriculum then have unpleasant courses to force students into specializing?
For myself, I am a holder of the Programming-view (perhaps based on being a paid programmer), as I view Computer Science to be centered on programs and the act of programming. The field is directed toward understanding programs and how to program well. Computer Science is informed in part through Mathematics by providing a basis for understanding of algorithms and data structures, of which programs are fundamentally composed. Many fields, like Bio-Informatics, are related and rely on Computer Science, but are not Computer Science.
Wednesday, September 26, 2012
Book Review: Coders at Work
Notable quotes follow:
"As they say, it's easier to optimize correct code than to correct optimized code." - Joshua Bloch
Several interviewees quote Tony Hoare who said "There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult."
"Types are essentially assertions about a program." - Dan Ingalls
"In those days at Bell Labs the amenities were great - you could call a telephone number and get a transcription. You know, leave a message and say you want it written up and it'll appear the next day in your inbox as sheets of paper. And [my coworker], after we talked about the file system on the blackboard for a little while, picked up the phone, called a number, and read the blackboard into the phone. It came back and it was about as close to a design document as we got except that it had homonyms that you wouldn't believe." - Ken Thompson, on the project that became Unix
"[Code is beautiful that has] a simple straightforward solution to a problem; that has some intrinsic structure and obviousness about it that isn't obvious from the problem itself." - Fran Allen
"When I say, 'You don't get credit because the program works. We're going to the next level. Working programs are a given,' they say, 'Oh.'" - Bernie Cosell
Wednesday, August 29, 2012
Computer Science Education (part 0 of N)
What are common naive theories that students have entering the course?
How might the course be designed to encourage students to reconstruct their theories?
So what naive theories do students have?
First, computers are magical. No, computers do exactly what a programmer tells them to do. (More advanced students learn about race conditions, compiler influence on correctness, etc). Which, unfortunately, means that if a computer is not doing what you want it to do, then you instructed it incorrectly (c.f., The rat is always right).
Second, I'm going to be a game programmer. No, most computer scientists do not write games (or at least, aren't paid to). But we find many other interesting parts to the field. Besides, many game programmers are treated little better than grad students.
Do you know other naive theories?
Then after class, I spent some time discussing more "advanced" theories in computer science.
Functional versus imperative programming. Does one paradigm exist to rule them all? Is one class of programming languages sufficient? Do students gain by learning about both paradigms? I discussed this briefly in Is versus ought, and have been regularly reading a strong function view in Existential Type.
Big 'O' notation and algorithm / data structure selection. I previously discussed this some in Know your N. And was co-author on a paper, "Brainy: effective selection of data structures", that demonstrated actual data structure selection for a program is not always best from the "Big 'O'" point of view.
Language equivalence. Related to functional versus imperative and one of my first posts, "Problem Solving via Programming", programming languages are theoretically equivalent (i.e., turning complete). But in practice languages should be selected for particular problems. What problems are best for specific languages?
What are some other major theories about computer science that students should know?