A discussion of how to do Computer Science well, particularly writing code and architecting program solutions.
Friday, September 14, 2018
Is C low level?
Processors must also maintain caches to hide the memory latency, which require significant logic to maintain coherence and the illusion that the memory is shared between the threads of a process. Furthermore, the compiler is also called upon to find optimization opportunities that may be unsound and definitely require programmer years to implement.
The author repeatedly contrasts with GPUs, while noting that they require very specific problems, or "at the expense of requiring explicitly parallel programs". If you were not keeping track, a GPU requires thousands of threads to match the CPU's performance. The author calls for, "A processor designed purely for speed, not for a compromise between speed and C support, would likely support large numbers of threads, have wide vector units, and have a much simpler memory model." Which generally sounds like the GPU design.
I appreciate the callouts to C's shortcomings, which it certainly has. The notion that C has driven processor design is odd, yet it does reflect the fact that processors are designed to run current programs fast. And with the programs being written in either C or a language built on C, that forces many programs into particular patterns. I even spent some time in my PhD studies considering a version of this problem: how do you design a new "widget" for the architecture if no programs are designed for widgets to be available?
All to say, I think C is a low-level language, and while its distance from hardware may be growing, there is nothing else beneath it. This is a gap that needs to be addressed, and by a language that has explicit parallel support.
Monday, August 27, 2018
Book Review: Valley of Genius
All in all I enjoyed the book, but that only merits 3.5/5 stars.
(A free copy of the book was provided through Goodreads.)
Tuesday, July 31, 2018
Book Review: The Art of Application Performance Testing
In testing applications, a performance analyst needs to establish a representative workload, which includes the actions to perform, and the combined load. For example, most users logging in to their bank will view their account balance, while others might transfer money or pay a bill. Combined these actions might represent most of the work from users. Then for each unit of server, how many users should be able to perform a mix of those actions, which forms the load.
After establishing the workload, the analyst needs to implement the described workload, which requires a tool that generates the load (either by driving the application itself or replaying a synthetic trace of the load). For those tools, what additional hardware is required to deploy this load? Does the deployment take into account geographic and other user variations (so that the load generation is representative of the user base)? Finally, what tooling and methodology exists for profiling and recording the execution of the workload for present and future analysis?
So I appreciated the content of the book and would recommend it to individuals focusing on testing of user-facing applications.
Wednesday, November 22, 2017
Review: Languages and Code Quality in GitHub
Wednesday, May 3, 2017
PhD Defense - Meeting Tail Latency SLOs in Shared Networked Storage
Workloads are generally bursty, and their characteristics are different. Furthermore, they may have service level objectives (SLOs), and the system needs to meet these different objectives. And the system contains a variety of resources that must be shared in some form. It is not sufficient to just divide the bandwidth. Nor can the system measure the latency and try reacting, particularly as bursty workloads do not give sufficient time to react. While each workload has deadlines, it would be too complex to tag request packets with the deadlines for queuing and routing. However, the deadlines can be used to generate priorities for requests.
The system is architected to have storage and network enforcement components to ensure QoS. There is also a controller that receives an initial trace to characterize each workload, and that workload's SLOs. The controller works through a sequence of analyses to successfully place each workload into the overall system.
Effectively, each workload is assigned a "bucket" of tokens, where the bucket size provides the ability to handle bursts and the rate that tokens are added covers the request rate for the workload. Shorter burstier workloads receive large buckets and low rates, while constant workloads with little bursts have high rates and small buckets. In both cases, only when the bucket is empty, is the workload rate-limited in its requests, and these requests receive the lowest priority. Deterministic Network Calculus (DNC) to model the worst-case queue scenarios. This plots two curves: the requesting flow and the service curve, both plotted as tokens by function of window size (dt). The maximum distance between the curves is the maximum latency.
Using three traces: DisplayAds, MSN, and LiveMaps, they tested three approaches: Cake (reactive approach), earliest deadline first, and Timothy's scheme (PriorityMeister). His scheme did significantly better than the others at meeting the SLOs. However, the DNC analysis was based on achieving 100% and not the SLO's 99% (or other percentile success). Depending on the characteristics, there can be significant differences between these guarantees. To model the latency percentiles, Stochastic Network Calculus (SNC) can achieve this; however, the math is significantly more complex. And the math had not previously been applied to this problem. DNC is still better when assuming that bursts are correlated or the system is in an adversarial setting. Reducing these assumptions (uncorrelated workloads), the SNC-based analysis permitted the system to admit 3x workloads versus the DNC analysis.
Workloads have a curve of satisfying bucket sizes and token rate pairs. Many systems require the user to provide its rate limit. Other systems use simple heuristics to either find the "knee of the curve" or select a rate limit as a multiple of the average rate. However, for an individual workload, all pairs are satisfying, it is only when workloads are combined in a system do the different pairs matter. The configurations of the set of workloads on the system can be solved for using a system of linear equations. Therefore, when placing new workloads, the controlling architecture can find successful placements, while potentially reconfiguring the workloads assigned.
One extension would be addressing failure modes. Currently, the system is assumed to be at degraded performance when components have failed.
Monday, May 1, 2017
Book Review: Multicore and GPU Programming: An Integrated Approach
First, the author elects to use QtThreads for the base parallel programming approach. He notes that he had considered using pthreads or C++11 thread support, and comments that he rejected C++11 threads for the book as the support was incomplete at the time. Pthreads is left without comment. That may be, but I have heard of and used those options, while QtThreads is something that I have never considered.
Second, the book serves as an admirable API reference. For one or a set of function calls, significant space is dedicated to how that call works and illustrating examples for it. Structured Parallel Programming also covers many APIs, yet it maintains a feel of being about parallel programming in general rather that the calls specifically. However, that work covers different API sets so the two are not explicitly comparable.
Third, and this issue is really more for the editors rather than the author, the typesetting on each page is poor. There is significant white space left bordering the text on each page. Furthermore, the code wraps, and often not for being long code, but for long comments and comments on the same line as the code. I understand that in programming these are stylistic choices; however, the impact of finding line wraps from long comments leaves the text looking unprofessional. I must assume that the author wrote the code separately and then provided for being included into the book, but the editor failed to make allowances for typesetting.
In conclusion, I wanted to like and use the book. Whenever I speak with the publisher, they always direct me to it, I just have to hope for something else to come along. Use it as a reference perhaps, but I am cautious in my recommendation.
(This book was provided free by the publisher to review for possible use in the classroom.)
Friday, April 21, 2017
Repost: What Makes a Program Elegant?
Monday, March 13, 2017
Book Review: Optimized C++: Proven Techniques for Heightened Performance
This book is not quite it. There is good material here, yet I found repeatedly thinking that the author was not aware of the past 10(?) years of changes to the field. Not an issue of the book was from the early 2000s, but it was published last year.
A key step in improving the performance of programs is measuring it. There are a variety of techniques for doing so. Tools based on instrumentation and tools based on sampling profiling. I find greater value to using the sampling profiling tools (for measuring performance) due to their lower overhead and ability to pinpoint where in a function this cost exists. Yet the book's focus is limited to gprof-esque approaches. I tell students that this approach is best with deep call trees, which may be a greater issue for C++ programming specifically.
The author is somewhat dismissive to compiler optimizations and emphasizes that his observed benefit has been particularly limited to function inlining. There are many more optimizations, and you should care about them. But again, I wonder if his experience of C++ has been deep call trees that could particularly benefit from inlining.
In a take it or leave it, this work also discourages the use of dynamic libraries. Yes, they impose a performance penalty, but they also provide valuable functionality. It all depends on your use case for whether you should statically or dynamically link your code. Code that is reused by separate executables should be in a dynamic library, as it reduces the memory requirements when running and reduces the effort to patch and update those executables. Components that are only used by a single executable should be statically linked, unless the components are of significant size such that decoupling can still benefit memory usage and the updating process.
The author related that replacing printf with puts to just print a string has performance advantages, due to printf being a complicated "God function". The basic point is valid that printf has significant functionality; however, the anecdote should be taken with a grain of salt. Current compilers will do this optimization (replace printf with puts) automatically.
While most of the work provides small examples, the final chapters on concurrency (?) and memory management do not. The concurrency chapter reads as a reference book, as it lists the various APIs available and what each does. It would be better for the book to assume that the readers are familiar with these calls (as the author does with many other topics) and discuss possible optimizations within this scope.
To conclude, the book is not bad, but I also cannot say it is accurate on every point. Especially with performance, programmers are apt to make prompt design decisions based on "their experience" or "recent publications". Measure your code's performance. Only then can you discern which techniques will provide value.
Thursday, December 29, 2016
Book Review: The Art of Readable Code
While the topic is style, this is really more about fundamentally good practices, which may be implemented in one of several ways (e.g., where to put braces, camel case, etc) that is termed style. Between this and the examples within the text, I want to start requiring it of students. I want them to read and see why style actually matters. Or maybe we will just have to wait until they experience why it matters and suffer for it.
Monday, April 13, 2015
Book Review: Structured Parallel Programming
This work is in three parts: the basics of parallelism and performance, the common patterns in which parallelism is expressed, and example implementations of several algorithms. The second part is the core of the work. To show maps, reduce, scatter and gather, stencil, fork-join, and pipeline. But before we learned those details, we would come to key quotes for all that I do:
You cannot neglect the performance of serial code, hoping to make up the difference with parallelism.And:
[The] performance of scalar processing is important; if it is slow it can end up dominating performance.Therefore, parallelism is a method for improving the performance of already efficient code.
With both the common patterns, as well as the example implementations, the authors generally provide the source code for each pattern and implementation using Cilk, TBB, and OpenMP. This source is not for casual readers. More involved implementations can stretch for several pages, as the initial implementation and then subsequent refinements are explored. While it serves well as a reference, it may have worked better to focus on one parallelism approach for each section and therefore give further explanation to the code, especially the language features used. And thereby retain the pattern itself rather than becoming a practitioners' reference.
The example implementations (the third part) are perhaps the least interesting for the classroom and potentially the most interesting for practitioners. Clearly, if I was trying to write code similar to one of these problems, I would have an excellent reference and starting point. However, that is quite rarely the case for myself and I suspect most people as well.
If I was teach a parallel programming course, I might consider using this work (although I still have other, similar textbooks to review); however, were I to do so I would be confining my teaching to the first two parts and may even to just 1 parallel programming paradigm. Yes, I will admit that the last parallel programming course I took covered a diversity of paradigms (Cilk, vectorization, GPUs, OpenMP, MPI), yet I would have preferred to focus more on what one or two paradigms are capable of rather than just the taste of many. Parallel programming takes a lot of work to learn and this book is one piece in that effort.
Tuesday, March 10, 2015
Book Review: Geek Sublime: The Beauty of Code, the Code of Beauty
The first worry was skipping a long section on how computers work. This *is* my field of Computer Science, so I wasn't interested in reading the basics. Yet worse was noticing some inaccuracies. They established a certain level of understanding in Computer Science that was concerning, particularly in light of the author's non-technical background. Livable, er readable, sure.
It was then the author's intent to show the great intellectual contributions made by Indians. I have no dispute of this point, except that it wasn't actually fitting with the established theme of the work. Finding that Sanskrit has a codified language of great antiquity enabled the author in this quest. Alas, it was long pages that grew increasingly divorced with the first part of the title, "The Beauty of Code" and further focus on the later, "The Code of Beauty". And this code deriving from Indian literary traditions.
In the end, the book concluded. A crescendo of final claims that each read like next page would be the last, until such time as there was really no text left. I learned something about Indian culture by reading this book, except that was not why I read it. I did not gain in what I sought, so I cannot recommend reading it, nor care to provide a link to it.
Saturday, March 7, 2015
Conference Attendance SIGCSE 2015 - Day 2 / 3
In traveling, take some time to prepare students. Let them know what to expect. For example, it is okay to miss some paper sessions, and even return to your room entirely. It is okay to ask questions 1:1. Find groups where people are being introduced and join in. Student volunteering, while takes time, also gives an additional individuals that you will know. Use the people you know to introduce you to others at the conference.
Thursday, March 5, 2015
Conference Attendance SIGCSE 2015 - Day 1 Morning
This morning, I am focusing on the "Curriculum" track. I am excited by the three papers in this track, the first looks at research, the second is on systems courses, and the last on parallel computing courses. Alas, I was in the hallway track and missed the first work. Perhaps I can find the authors later.
Backward Design: An Integrated Approach to a Systems Curriculum
The goal of systems is "higher level software creation". Computer Science courses are split into Core Tier 1 and Tier 2 (a term from the ACM 2013 curriculum), where the former are taken by all CS majors and the later are only taken by most or some. One issue in the old curriculum was that OS also taught C. In crafting a new curriculum, first establish a vision statement, which can be used in conflict resolution (and also revised). Establish SMART objectives to prepare and build the assessments. The results can be found on github.
A Module-based Approach to Adopting the 2013 ACM Curricular Recommendations on Parallel Computing
Parallel computing is important and important for CS graduates to know. The 2013 ACM Curriculum increased the number of hours that students should take in parallel computing. Part of the recommendations are to place parallel computing into the curriculum and not just as a course. Thus parallelism modules are placed throughout the curriculum (perhaps as early as CS1 or CS2). Find the level of abstraction for a concept and introduce it appropriately. For example, Amdahl's Law in CS1 versus cache coherence in senior-level class. 5 modules of parallelism were established, which have equivalences with the ACM. Each course in the curriculum may have 1 or more modules, which then teaches and reinforces the topics. Even after adding these modules, there has continued to be incremental development and revisions, which have improved student outcomes. The key take away is that it is possible to introduce these recommendations without completely rewriting the curriculum.
In the afternoon, I will be standing with my poster - Using Active Learning Techniques in Mixed Undergraduate / Graduate Courses. Later I will post updates from my afternoon.
Monday, July 14, 2014
Book Review: The Practice of Programming
I recently found, The Practice of Programming (Addison-Wesley Professional Computing Series), sitting on the shelf at my local library. I am generally skeptical when it comes to programming books, and particularly those from different decades, but I trusted the name "Brian Kernighan" so I checked the book out.
And I am so glad that I did. From the first chapter that discussed style, I wanted to read more. And the only reason to ever stop reading was to pull out a computer and put these things into practice. I didn't even mind that it wasn't until chapter 7 that performance was discussed. Still, I will readily acknowledge that I disagree with some of statements in the book. Furthermore, there are some parts of the text that are clearly dated, like discussing current C / C++ standards.
I'd like to conclude with a brief code snippet from the work. This code is part of a serializer / deserializer. Such routines are always a pain to write and particularly if you have many different classes / structs that need them. Thus the authors suggest using vargs and writing a single routine that can handle this for you. Here is the unpack (i.e., deserialize) routine:
/* unpack: unpack packed items from buf, return length */
int unpack(uchar *buf, char *fmt, ...)
{
va_list args;
char *p;
uchar *bp, *pc;
ushort *ps;
ulong *pl;
bp = buf;
va_start(args, fmt);
for (p = fmt; *p != '\0'; p++) {
switch (*p) {
case 'c': /* char */
pc = va_arg(args, uchar*);
*pc = *bp++;
break;
case 's': /* short */
ps = va_arg(args, ushort*);
*ps = *bp++ << 8;
*ps |= *bp++;
break;
case 'l': /* long */
pl = va_arg(args, ulong*);
*pl = *bp++ << 24;
*pl |= *bp++ << 16;
*pl |= *bp++ << 8;
*pl |= *bp++;
default: /* illegal type character */
va_end(args);
return -1;
}
}
va_end(args);
return bp - buf;
}
So now we have a little language for describing the format of the data in the buffer. We invoke unpack with a string like "cscl" and pointers to store the char, short, char and long. Hah! That's it. Anytime we add new types, we just to call the pack / unpack.
Does it matter that the variables are only sequences like "pl" or "bp"? No. Variable names should be meaningful and consistent. "i" can be fine for a loop iterator.
We have given up some performance (*gasp*), but gained in the other parts that matter like readability and maintainability. I plan on using this in my current research (but only the unpack, as my serializers are already highly optimized). All in all, I approve of this book and may even someday require it as a textbook for students.
Tuesday, February 4, 2014
Book Review: ARM Assembly Language: Fundamentals and Techniques
This is the story of that book - ARM Assembly Language: Fundamentals and Techniques. An interesting book that covered the basic of what I wanted to learn, but the short coming was that it had an expected environment that was different from mine. ARM processors can be found in a greater diversity of devices than say, x86. Yet, I am still thinking about the ARM processor as a drop-in replacement. I look more to devices like Microsoft's Surface or a smartphone, and think about the presence of an OS, etc.
I learned particularly that the ARM instructions have bits to make them predicated. And I realized then that conditional branches are really just predicated instructions. If the predicate(s) are true, then take the branch. Just another perspective on instruction sets. Anyway, I look forward to getting a Raspberry Pi, so I can try out some of what I've learned and get a chance to also work through the assembly generated by compilers.
Monday, September 23, 2013
Book Review: Turing's Cathedral
In contrast, a significant portion of the book was instead written as a history of the Institute. Given that the Institute provided access to the records used as a significant part of the source material, it is understandable that the author's focus would be so directed. However, it adds to the misleading focus that this work follows.
Of perhaps greater slight is that a work titled "Turing's Cathedral" only features Alan Turing for a small part of the writing. Instead we find greater focus placed on his work and how it fit into the research of that time. Eventually leading von Neumann to explore the usage of electronic digital computers to solve the US military's problems. He, like many European scientists, had left his homeland ahead of Hitler, and these scientists supported work leading to Germany's defeat.
The grand development that von Neumann introduced was making a computer, programmable. Beyond just reconfigurable, the project he lead at the IAS was programmable, the electronic device could store both data as well as codes that were instructions for what the device was to do. Consider that for the next 50 years, programs would be constrained by having to store the instructions in memory, which was often a very limited resource.
So John von Neumann stars in a book titled for Turing, and a book that devotes a third of its pages to references. A good, interesting work that could probably have been improved by an editor's scissors. To trim the writing down to the core bits about computers, and set aside so much of the well researched chapters to attain a focus that is lacking.
Monday, July 22, 2013
Review: Patterns for Cache Optimizations on Multi-processor Machines (ParaPLoP '10)
In this work, they explore three patterns of cache (mis-)use on modern processors. The first pattern is termed "loop interchange", named for its solution. In this pattern, the program does not access data with spatial locality. Instead of accessing every element in a cache line, the program has a different ordering and only touches a subset of the cache line, while later (after the line has been evicted) it accesses other subsets. In the example below, assume that N and M are both quite large (say 1+ million), so this code will likely have significant cache misses (at minimum L1 misses), while switching the "i" and "j" for loops (i.e. interchange) will considerably reduce the number of cache misses.
int X[N][M];
for (j = 0; j < M; j++)
for (i = 0; i < N; i ++)
f(X[i][j]); // Any set of operations applied to this element.
The next pattern is false sharing. Threads in a program intentionally and unintentionally share data. Data structures are written by programmers to logically group data; however, the grouping and structuring of the data is often made by the programmer and not for algorithmic need. The hardware is expecting locality from arrays and data structures. When multithreaded, the cache line is the unit by which the hardware tracks sharing of data. So if different threads write to different data in the same cache line, then hardware treats the writes as being made to the same thing, which precludes it from caching. The usual recommendation for solving this problem is to pad the data, so that the software notion (int) and hardware notion (cache line) are the same size.
int X[N];
void* thread_work(int tid)
{
for (int i = 0; i < N; i++)
if (i % num_threads == tid)
X[i] = do_work(X[i]);
}
This second example goes beyond the paper's scope for false sharing. Common data structures may also have different sharing patterns for each element. For example in this data structure, the following fields are written to: encoding, sum, weight_left, and weight_right. The rest are read-only. Currently the data structure uses two cache lines (as all fields are 8-bytes in size). If the structure was rearranged so that the written fields were in one cache line and the read-only fields in the second line, then updates by any thread would only result in one cache miss rather than two. Padding may be required, but the key insight here is arranging data by sharing pattern, which is a generalization of the previous paragraph.
typedef struct _node {
graph_value_t value, encoding;
unsigned long long sum;
struct _edge* fwd;
struct _edge* back;
// tree sorted by value
struct _node* left;
struct _node* right;
// tree sorted by encoding
struct _node* weight_left;
struct _node* weight_right;
} node, *pnode;
The final pattern explored in the paper is data alignment. Ignoring the issue of misaligned accesses, let's look at misaligned allocations. Suppose we allocate an array of 48-byte data structures in a multithreaded program. Sometimes accessing an element is one cache miss, but sometimes it is two. The runtime system has packed the data structures together, with 4 fitting in 3 cache lines. In general, when you allocate data structures, they come with the same alignment as in the array, made to a 16-byte boundary, but this boundary is not guaranteed to be the start of a cache line. The primary solution is to use support calls that change the allocation alignment. This may waste space, but now the allocation comes using our expected number of cache lines. And by using the lines we expect, we can tailor the program to the architecture and observe the expected performance characteristics.
The patterns are three simple ones that architects and performance minded programmers have known for years. I am pleased to see them being reiterated, but the response may be like that from the developer after I changed his code per these patterns years ago, "Why can't the compiler just do that for me?!"
Thursday, July 11, 2013
Book Review: Exceptional C++ Style 40 New Engineering Puzzles, ...
Were I forced to read a book directly on the C++ standards, let's just say I can think of lots of things I'd rather be doing. But while Exceptional C++ Style: 40 New Engineering Puzzles, Programming Problems, and Solutions exposed me to more of the standards, I never felt like I was reading quotes from the standard. Instead, I was listening to an interesting conversation about some real programming questions that just may require invoking the standards to answer.
I enjoyed chapters 20 and 21, as I appreciate the effort toward explaining how memory is allocated and structures laid out. Help dispel the ignorance that new / malloc are how the OS provides memory. And I then learned that new will throw an exception instead of returning NULL. Perhaps time to rewrite some code. Furthermore, I understand now why most C++ code uses preincrement on iterators.
It is not strictly a book on style, but instead this tome covers the style I care most about: good programming practices. I don't care which style convention you use, so long as you use one consistently. But for whatever style your code has, it had better be good code.
I recommend reading this book even if you do not regularly use C++. I will note that it is dated; however, unless you are now using C++11, the text is still timely and even if you are using C++11 I doubt that everything has changed (though I did notice the discussion of the auto keyword was out of date).
Friday, November 2, 2012
Computer Science Education: Student Conceptions
Who would be considered a Computer Scientist?
How does a student choose his or her particular focus in Computer Science?
Do misconceptions about Computer Science affect the student's education?
Students have three conceptions about what Computer Science is, which were first derived from interviews and reaffirmed through a 100 student survey. The Theory-view (8%) is that Computer Science is mostly concerned with a theoretical view of computers, where the mathematical basis and understanding is dominant (although there may then exist a related field of Software Engineering). The Programming-view (41%) is that all Computer Science is about programs, where one is either analyzing the basis of or the direct work on programming. The Broad-view (27%) is that Computer Science is a giant umbrella of disciplines where computers are involved. A final 23% of the survey responses were without clear category, which may be due to the limitation of the original interviews. All conceptions view algorithms and data structures as a vital component to Computer Science.
Students use enjoyment of classes as the dominant metric of whether they have an affinity for the area. Ironically, one of the two dominant courses in my undergraduate education (15-213) was offered at 9am, which would commonly be viewed as an unpleasant time. Since students use enjoyment as their metric, students are not particularly affected by their misconceptions about what the course(s) contains.
Clearly then, the enjoyability of a course can then affect the taking of subsequent courses. Which in future work, there may be an exploration of whether course enjoyment effects (like scheduled time) has an effect on enrollment in follow-on courses in subsequent semesters. Furthermore, many students trust the curriculum as providing an adequate preparation for practicing Computer Science, which is to say that they are prepared as long as they satisfy the requirements regardless of any attempt to have a focus in their course selection. Should the curriculum then have unpleasant courses to force students into specializing?
For myself, I am a holder of the Programming-view (perhaps based on being a paid programmer), as I view Computer Science to be centered on programs and the act of programming. The field is directed toward understanding programs and how to program well. Computer Science is informed in part through Mathematics by providing a basis for understanding of algorithms and data structures, of which programs are fundamentally composed. Many fields, like Bio-Informatics, are related and rely on Computer Science, but are not Computer Science.
Wednesday, September 26, 2012
Book Review: Coders at Work
Notable quotes follow:
"As they say, it's easier to optimize correct code than to correct optimized code." - Joshua Bloch
Several interviewees quote Tony Hoare who said "There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult."
"Types are essentially assertions about a program." - Dan Ingalls
"In those days at Bell Labs the amenities were great - you could call a telephone number and get a transcription. You know, leave a message and say you want it written up and it'll appear the next day in your inbox as sheets of paper. And [my coworker], after we talked about the file system on the blackboard for a little while, picked up the phone, called a number, and read the blackboard into the phone. It came back and it was about as close to a design document as we got except that it had homonyms that you wouldn't believe." - Ken Thompson, on the project that became Unix
"[Code is beautiful that has] a simple straightforward solution to a problem; that has some intrinsic structure and obviousness about it that isn't obvious from the problem itself." - Fran Allen
"When I say, 'You don't get credit because the program works. We're going to the next level. Working programs are a given,' they say, 'Oh.'" - Bernie Cosell