Elegant C: abstraction

Friday, September 14, 2018

Is C low level?

A recent ACM article, C is Not a Low-Level Language, argues that for all of our impressions that C is close to hardware, it is not actually "low-level". The argument is as follows, C was written for the PDP-11 architecture and at the time, it was a low-level language. As architectures have evolved, C's machine model has diverged from hardware, which has forced processor design to add new features to attain good performance with C, such as superscalar for ILP and extensive branch prediction.
Processors must also maintain caches to hide the memory latency, which require significant logic to maintain coherence and the illusion that the memory is shared between the threads of a process. Furthermore, the compiler is also called upon to find optimization opportunities that may be unsound and definitely require programmer years to implement.

The author repeatedly contrasts with GPUs, while noting that they require very specific problems, or "at the expense of requiring explicitly parallel programs". If you were not keeping track, a GPU requires thousands of threads to match the CPU's performance. The author calls for, "A processor designed purely for speed, not for a compromise between speed and C support, would likely support large numbers of threads, have wide vector units, and have a much simpler memory model." Which generally sounds like the GPU design.

I appreciate the callouts to C's shortcomings, which it certainly has. The notion that C has driven processor design is odd, yet it does reflect the fact that processors are designed to run current programs fast. And with the programs being written in either C or a language built on C, that forces many programs into particular patterns. I even spent some time in my PhD studies considering a version of this problem: how do you design a new "widget" for the architecture if no programs are designed for widgets to be available?

All to say, I think C is a low-level language, and while its distance from hardware may be growing, there is nothing else beneath it. This is a gap that needs to be addressed, and by a language that has explicit parallel support.

Wednesday, April 3, 2013

Architecture Abstraction Leaks

As a computer architect, I like to think that the hardware abstraction is generally opaque. If a programmer really needs performance or certain effects, then it is possible to reach through the layer, but in general the computer gives you what you want. Sometimes, a programmer can do simple things that pull the covers back and reveal the architecture in all of its gritty detail. One example would be following row versus column indexing when looping over large pieces of data.

In this post, I wanted to draw your attention to a question posed on stackoverflow, Why is processing a sorted array faster than an unsorted array? In this question, we are ignoring the time to sort the array. The operation being applied to each array element is relatively simple and is conditional on the value of the element. By sorting the array, the same operations could be applied to contiguous elements. And as the same operations are being applied, the code in question was having a clear win from branch prediction.

But the branch predictor is abstracted hardware, so once again the programmer just learned about the architecture when he / she didn't intend to.