A discussion of how to do Computer Science well, particularly writing code and architecting program solutions.
Tuesday, December 11, 2012
Repost: A New View of Exceptions
Friday, November 2, 2012
Computer Science Education: Student Conceptions
Who would be considered a Computer Scientist?
How does a student choose his or her particular focus in Computer Science?
Do misconceptions about Computer Science affect the student's education?
Students have three conceptions about what Computer Science is, which were first derived from interviews and reaffirmed through a 100 student survey. The Theory-view (8%) is that Computer Science is mostly concerned with a theoretical view of computers, where the mathematical basis and understanding is dominant (although there may then exist a related field of Software Engineering). The Programming-view (41%) is that all Computer Science is about programs, where one is either analyzing the basis of or the direct work on programming. The Broad-view (27%) is that Computer Science is a giant umbrella of disciplines where computers are involved. A final 23% of the survey responses were without clear category, which may be due to the limitation of the original interviews. All conceptions view algorithms and data structures as a vital component to Computer Science.
Students use enjoyment of classes as the dominant metric of whether they have an affinity for the area. Ironically, one of the two dominant courses in my undergraduate education (15-213) was offered at 9am, which would commonly be viewed as an unpleasant time. Since students use enjoyment as their metric, students are not particularly affected by their misconceptions about what the course(s) contains.
Clearly then, the enjoyability of a course can then affect the taking of subsequent courses. Which in future work, there may be an exploration of whether course enjoyment effects (like scheduled time) has an effect on enrollment in follow-on courses in subsequent semesters. Furthermore, many students trust the curriculum as providing an adequate preparation for practicing Computer Science, which is to say that they are prepared as long as they satisfy the requirements regardless of any attempt to have a focus in their course selection. Should the curriculum then have unpleasant courses to force students into specializing?
For myself, I am a holder of the Programming-view (perhaps based on being a paid programmer), as I view Computer Science to be centered on programs and the act of programming. The field is directed toward understanding programs and how to program well. Computer Science is informed in part through Mathematics by providing a basis for understanding of algorithms and data structures, of which programs are fundamentally composed. Many fields, like Bio-Informatics, are related and rely on Computer Science, but are not Computer Science.
Wednesday, September 26, 2012
Book Review: Coders at Work
Notable quotes follow:
"As they say, it's easier to optimize correct code than to correct optimized code." - Joshua Bloch
Several interviewees quote Tony Hoare who said "There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult."
"Types are essentially assertions about a program." - Dan Ingalls
"In those days at Bell Labs the amenities were great - you could call a telephone number and get a transcription. You know, leave a message and say you want it written up and it'll appear the next day in your inbox as sheets of paper. And [my coworker], after we talked about the file system on the blackboard for a little while, picked up the phone, called a number, and read the blackboard into the phone. It came back and it was about as close to a design document as we got except that it had homonyms that you wouldn't believe." - Ken Thompson, on the project that became Unix
"[Code is beautiful that has] a simple straightforward solution to a problem; that has some intrinsic structure and obviousness about it that isn't obvious from the problem itself." - Fran Allen
"When I say, 'You don't get credit because the program works. We're going to the next level. Working programs are a given,' they say, 'Oh.'" - Bernie Cosell
Wednesday, August 29, 2012
Computer Science Education (part 0 of N)
What are common naive theories that students have entering the course?
How might the course be designed to encourage students to reconstruct their theories?
So what naive theories do students have?
First, computers are magical. No, computers do exactly what a programmer tells them to do. (More advanced students learn about race conditions, compiler influence on correctness, etc). Which, unfortunately, means that if a computer is not doing what you want it to do, then you instructed it incorrectly (c.f., The rat is always right).
Second, I'm going to be a game programmer. No, most computer scientists do not write games (or at least, aren't paid to). But we find many other interesting parts to the field. Besides, many game programmers are treated little better than grad students.
Do you know other naive theories?
Then after class, I spent some time discussing more "advanced" theories in computer science.
Functional versus imperative programming. Does one paradigm exist to rule them all? Is one class of programming languages sufficient? Do students gain by learning about both paradigms? I discussed this briefly in Is versus ought, and have been regularly reading a strong function view in Existential Type.
Big 'O' notation and algorithm / data structure selection. I previously discussed this some in Know your N. And was co-author on a paper, "Brainy: effective selection of data structures", that demonstrated actual data structure selection for a program is not always best from the "Big 'O'" point of view.
Language equivalence. Related to functional versus imperative and one of my first posts, "Problem Solving via Programming", programming languages are theoretically equivalent (i.e., turning complete). But in practice languages should be selected for particular problems. What problems are best for specific languages?
What are some other major theories about computer science that students should know?
Friday, July 20, 2012
Repost: Programmer Jargon
1 - I'm thrilled to learn of the "Yoda condition", so now I have something to call this silly way of writing the condition. Can anyone tell me why you'd write the condition this way?
8 - Heisenbugs are something I've been referring to for over 10 years.
20 - I'll need to inform students that "ninja comments" are not acceptable documentation.
Tuesday, July 10, 2012
Meta Post
Thursday, May 31, 2012
Know your "N"
Another time a developer came and discussed memory limitations, due to using 64-bit counters. I asked about how large the input was and learned it was under a billion elements. Well, a billion fits into 32-bit counters and now the memory consumption is cut by half. (Maybe I need more stories, as this was also in Bit Packing.)
The first step to any good performance analysis is selecting an algorithm, based on the problem (including its size). Bubble sort an array of 10 elements, but not 10,000. In fact, ask whether the problem really requires sorting at all.
I will continue to strongly urge developers to measure the performance of their code, before trying to tune their code. But part of this measurement is knowing the "size" of the inputs.
Thursday, May 10, 2012
The Minimum of Computer Science
Should someone learn the basics of javascript, so they can add to webpages? Perhaps a little python to do simple processing of data? But probably not C, as the language expresses what the computer is to do and not what the computer should do.
Edit: A strong response to this general question here.
Tuesday, April 3, 2012
Rules of Computer Science
1) If a computer can execute something, you can write it in C. (aka Turing-completeness)
1a) Still, it is probably faster to write the program in a high-level language.
2) The computer does what it is told to do. There is no magic.
2a) If a behavior is not what you expect, then you don't know what you told the computer to do.
I am sure there are other such "rules", so please share if you have other maxims from your time and experience.
Friday, March 2, 2012
Usage of NULL
if (ptr)
- or -
if (ptr != NULL)
Do you know that the assembly for these two sequences is the same? So the first has not optimized the running time of the program. But it does save 8 characters, which mattered long ago. But now we have plenty of disk space. So always write the test for NULL by actually testing against NULL. Or I'll come take off style points the next time I'm grading. (If you are writing one-use code, which no one will see then go ahead.)
Separately, the operating system provides explicit support for NULL. It sets up every process's virtual memory so that virtual addresses 0x0 - 0x3ff (and possibly more) have no mapping so any access to NULL will never be valid.
Tuesday, February 14, 2012
Bit Packing: A World Gone Mad
In this post, I'm going to discuss some tricks and ideas for reducing the memory usage of a program. Before trying these steps, you should know the program and its workload using profilers, etc. The program should be verified for no memory leaks.
First example. A program was running out of memory. This program went through a memory trace and counted where the memory request went. A count existed for every N locations of memory and each one was a unsigned 64-bit integer (uint_64). With 1 billion memory operations, the worst case is a single count of 1 billion (roughly 2^30). With a fairly uniform distribution, a billion counters would hold 10, perhaps 100. Switching to a 32-bit integer reduced the program's footprint by 4GB, allowing work to continue. Other proposals included, using a two-level counter scheme where the overflow of the small counter (16-bit) results in using a full-sized (64-bit) counter, or only tracking subsets of the counters via random sampling.
The lesson learned is that having some spare bits is acceptable, but counting to, say, "100" when the location can go to 1.84467441 × 10^19 might be excessive.
Second example. I've probably written about this before, and I'll likely do so again. Locks are often wastes of space. Consider the following:
atomic_add(int* loc, int val) { lock() *loc += val unlock() }
Looks fine, right? Consider that the lock is 4 bytes of space, of which 1 bit indicates whether it is acquired. Two compactions are possible. First, the lock could be integrated into the value itself. This would reduce the range for the value, as one bit is now dedicated to holding the lock state. Or second, many architectures (like x86) support the above operation as a single instruction. So using the instruction, removes the need for a lock bit / word entirely.
Why don't we always replace the locks with atomic instructions? Two reasons: first, if the locked section is ever more complex, then the atomic sequence becomes vastly more complex, as only the simplest of operations (add, subtract, or, and, etc) are supported with atomicity. Second, the determination of the appropriate transformations in the compiler is exceedingly complex and may not be solvable in the general case. Since the compiler wouldn't be able to do this generally, it looks instead to the programmer to introduce this change.
Third example. Storing lots of pointers to long lists of numbers (these could be strings or perhaps lists of edges in a graph). On a 64-bit system, each pointer is a 64-bit number, or is it? Current x86-64 architectures are limited to 48 bits of virtual addresses, to reduce the overhead of tracking the virtual memory space. So every pointer gives us 48 bits of information and 16 bits of 0s. (Many pointers are also aligned, which zeros some of the low order bits). And whenever the programmer knows the values at compile time, it is a constant!
A common operation is to ask for the length of a list. Ideally, this value is stored. But where? If the leading pointer has 16 available bits, then the length could be made part of the pointer. But what if the length could be greater than 65536 (2^16)? A hybrid floating point design could be used.
[I/F:1][LENGTH:15][ADDR:48]
The first bit indicates whether an integer or floating point is stored. The next 15 bits store the length. If it is a float, then there is an exponent and fractional component. 6 bits will indicate exponents for a 64-bit number (note that these floats should be greater than 2^15, due to the precise integer representation available). Or 4 bits would give 2^16 - 2^31. The remainder of the 15 bits are for the fraction.
Most programs will not need bit packing. Every time it is packed, there is the cost of extra instructions. But you will sometimes just need to save the space, even if it means making the design more complex.