Friday, July 20, 2012

Repost: Programmer Jargon

I'd like to highlight a few of the new programmer jargons.

1 - I'm thrilled to learn of the "Yoda condition", so now I have something to call this silly way of writing the condition. Can anyone tell me why you'd write the condition this way?

8 - Heisenbugs are something I've been referring to for over 10 years.

20 - I'll need to inform students that "ninja comments" are not acceptable documentation.

Tuesday, July 10, 2012

Meta Post

It took 677 days, but I've now received 1000 views. Just over 50% of views are from the United States and 60% of views are from machines running Windows. Now, I'd say that I'm going to get back to research, but I'm taking today off to run errands with my wife.

Thursday, May 31, 2012

Know your "N"

I once was working on some performance analysis and came to the developers asking how their code was consuming a disproportionate amount of time. At low load, the code was say 5% of non-idle CPU usage, but at high load, the code was now 50%. The developers noted that the code contained an algorithm that was O(n^2), but they had only run n=1,2,...10. My high load was n=1000, which exposed the scaling issues of their code. The developers had done everything right, except knowing the scope of the problem (and to be fair, their code was originally for desktops and I was testing servers).

Another time a developer came and discussed memory limitations, due to using 64-bit counters. I asked about how large the input was and learned it was under a billion elements. Well, a billion fits into 32-bit counters and now the memory consumption is cut by half. (Maybe I need more stories, as this was also in Bit Packing.)

The first step to any good performance analysis is selecting an algorithm, based on the problem (including its size). Bubble sort an array of 10 elements, but not 10,000. In fact, ask whether the problem really requires sorting at all.

I will continue to strongly urge developers to measure the performance of their code, before trying to tune their code. But part of this measurement is knowing the "size" of the inputs.

Thursday, May 10, 2012

The Minimum of Computer Science

The NY Times ran an article at the start of last month on what CS education is required for non-majors, Computer Science for Non-Majors. Is the knowledge of how to write a (simple) program so vital to modern work that it should rank with multiplication tables, English grammar, etc? And if so, how much knowledge and in what language(s) should this be taught?

Should someone learn the basics of javascript, so they can add to webpages? Perhaps a little python to do simple processing of data? But probably not C, as the language expresses what the computer is to do and not what the computer should do.

Edit: A strong response to this general question here.

Tuesday, April 3, 2012

Rules of Computer Science

As I've spent these many years learning and practicing computer science, I've come to have a couple of "rules" (i.e. guidelines) of programming.

1) If a computer can execute something, you can write it in C. (aka Turing-completeness)

1a) Still, it is probably faster to write the program in a high-level language.

2) The computer does what it is told to do. There is no magic.

2a) If a behavior is not what you expect, then you don't know what you told the computer to do.

I am sure there are other such "rules", so please share if you have other maxims from your time and experience.

Friday, March 2, 2012

Usage of NULL

I read an argument recently about whether a test for NULL should be:

if (ptr)

- or -

if (ptr != NULL)

Do you know that the assembly for these two sequences is the same? So the first has not optimized the running time of the program. But it does save 8 characters, which mattered long ago. But now we have plenty of disk space. So always write the test for NULL by actually testing against NULL. Or I'll come take off style points the next time I'm grading. (If you are writing one-use code, which no one will see then go ahead.)

Separately, the operating system provides explicit support for NULL. It sets up every process's virtual memory so that virtual addresses 0x0 - 0x3ff (and possibly more) have no mapping so any access to NULL will never be valid.

Tuesday, February 14, 2012

Bit Packing: A World Gone Mad

Let's start with a mythical program. It has some classes (structs, objects, etc) that are instantiated. Each field has a meaning given to it by the code. Many fields can hold far more data than is required of them. In our mythical program, these unused bits are of no concern, as memory is plentiful. But then the mythical program is run on a real machine and bad things happen. Maybe it runs slowly, or crashes when malloc() returns NULL. Now what?!

In this post, I'm going to discuss some tricks and ideas for reducing the memory usage of a program. Before trying these steps, you should know the program and its workload using profilers, etc. The program should be verified for no memory leaks.

First example. A program was running out of memory. This program went through a memory trace and counted where the memory request went. A count existed for every N locations of memory and each one was a unsigned 64-bit integer (uint_64). With 1 billion memory operations, the worst case is a single count of 1 billion (roughly 2^30). With a fairly uniform distribution, a billion counters would hold 10, perhaps 100. Switching to a 32-bit integer reduced the program's footprint by 4GB, allowing work to continue. Other proposals included, using a two-level counter scheme where the overflow of the small counter (16-bit) results in using a full-sized (64-bit) counter, or only tracking subsets of the counters via random sampling.

The lesson learned is that having some spare bits is acceptable, but counting to, say, "100" when the location can go to 1.84467441 × 10^19 might be excessive.

Second example. I've probably written about this before, and I'll likely do so again. Locks are often wastes of space. Consider the following:
atomic_add(int* loc, int val)
{
    lock()
    *loc += val
    unlock()
}

Looks fine, right? Consider that the lock is 4 bytes of space, of which 1 bit indicates whether it is acquired. Two compactions are possible. First, the lock could be integrated into the value itself. This would reduce the range for the value, as one bit is now dedicated to holding the lock state. Or second, many architectures (like x86) support the above operation as a single instruction. So using the instruction, removes the need for a lock bit / word entirely.

Why don't we always replace the locks with atomic instructions? Two reasons: first, if the locked section is ever more complex, then the atomic sequence becomes vastly more complex, as only the simplest of operations (add, subtract, or, and, etc) are supported with atomicity. Second, the determination of the appropriate transformations in the compiler is exceedingly complex and may not be solvable in the general case. Since the compiler wouldn't be able to do this generally, it looks instead to the programmer to introduce this change.

Third example. Storing lots of pointers to long lists of numbers (these could be strings or perhaps lists of edges in a graph). On a 64-bit system, each pointer is a 64-bit number, or is it? Current x86-64 architectures are limited to 48 bits of virtual addresses, to reduce the overhead of tracking the virtual memory space. So every pointer gives us 48 bits of information and 16 bits of 0s. (Many pointers are also aligned, which zeros some of the low order bits). And whenever the programmer knows the values at compile time, it is a constant!

A common operation is to ask for the length of a list. Ideally, this value is stored. But where? If the leading pointer has 16 available bits, then the length could be made part of the pointer. But what if the length could be greater than 65536 (2^16)? A hybrid floating point design could be used.

[I/F:1][LENGTH:15][ADDR:48]

The first bit indicates whether an integer or floating point is stored. The next 15 bits store the length. If it is a float, then there is an exponent and fractional component. 6 bits will indicate exponents for a 64-bit number (note that these floats should be greater than 2^15, due to the precise integer representation available). Or 4 bits would give 2^16 - 2^31. The remainder of the 15 bits are for the fraction.

Most programs will not need bit packing. Every time it is packed, there is the cost of extra instructions. But you will sometimes just need to save the space, even if it means making the design more complex.