Elegant C: review

Showing posts with label review. Show all posts

Friday, September 14, 2018

Is C low level?

A recent ACM article, C is Not a Low-Level Language, argues that for all of our impressions that C is close to hardware, it is not actually "low-level". The argument is as follows, C was written for the PDP-11 architecture and at the time, it was a low-level language. As architectures have evolved, C's machine model has diverged from hardware, which has forced processor design to add new features to attain good performance with C, such as superscalar for ILP and extensive branch prediction.
Processors must also maintain caches to hide the memory latency, which require significant logic to maintain coherence and the illusion that the memory is shared between the threads of a process. Furthermore, the compiler is also called upon to find optimization opportunities that may be unsound and definitely require programmer years to implement.

The author repeatedly contrasts with GPUs, while noting that they require very specific problems, or "at the expense of requiring explicitly parallel programs". If you were not keeping track, a GPU requires thousands of threads to match the CPU's performance. The author calls for, "A processor designed purely for speed, not for a compromise between speed and C support, would likely support large numbers of threads, have wide vector units, and have a much simpler memory model." Which generally sounds like the GPU design.

I appreciate the callouts to C's shortcomings, which it certainly has. The notion that C has driven processor design is odd, yet it does reflect the fact that processors are designed to run current programs fast. And with the programs being written in either C or a language built on C, that forces many programs into particular patterns. I even spent some time in my PhD studies considering a version of this problem: how do you design a new "widget" for the architecture if no programs are designed for widgets to be available?

All to say, I think C is a low-level language, and while its distance from hardware may be growing, there is nothing else beneath it. This is a gap that needs to be addressed, and by a language that has explicit parallel support.

Monday, August 27, 2018

Book Review: Valley of Genius

Valley of Genius is a history of Silicon Valley based on the people who made it. Since the work was based on interviews, I expected that it would read as actual interviews, where the dialog exists between the author and the "genius". Instead, the author was removed from the chapters and instead the entire text consisted of the different participants being quoted. This writing style took sometime to accept. Initially, I wanted to know exactly who each person was and their role in the narration, which given the numbers involved would significantly detract from the story being told. Then I stopped bothering with the names, only looking for them when two (or more) speakers refer to each other. And this aspect is the best of the book, to have the different individuals having a debate in the dialog, otherwise each chapter is just narration. That said, the concern is that there is lost context to the quotes, as the author has explicitly stated the interviews had been spliced together.

All in all I enjoyed the book, but that only merits 3.5/5 stars.

(A free copy of the book was provided through Goodreads.)

Tuesday, July 31, 2018

Book Review: The Art of Application Performance Testing

The Art of Application Performance Testing, covers what it says. The book starts with concepts general to any performance testing, which was interesting to me. Most of the text focuses though on the Application part of the title. The applications here are primarily web-based, or other client-server based setups, and not just the generic "application" referring to any program. That said, I do not work on such applications, so the remainder of the text was of less value to me.

In testing applications, a performance analyst needs to establish a representative workload, which includes the actions to perform, and the combined load. For example, most users logging in to their bank will view their account balance, while others might transfer money or pay a bill. Combined these actions might represent most of the work from users. Then for each unit of server, how many users should be able to perform a mix of those actions, which forms the load.

After establishing the workload, the analyst needs to implement the described workload, which requires a tool that generates the load (either by driving the application itself or replaying a synthetic trace of the load). For those tools, what additional hardware is required to deploy this load? Does the deployment take into account geographic and other user variations (so that the load generation is representative of the user base)? Finally, what tooling and methodology exists for profiling and recording the execution of the workload for present and future analysis?

So I appreciated the content of the book and would recommend it to individuals focusing on testing of user-facing applications.

Wednesday, November 22, 2017

Review: Languages and Code Quality in GitHub

The study is several years old, but was recently reprinted in the Communications of the ACM. In it, they mined GitHub data for active open source projects, collecting the defect and development rates. They classified the defects according to their type, and the development language according to their feature. And they found that language choice matters, marginally. Some types of bug are far more common, such as memory management bugs in C / C++. Functional languages have the lowest rate; however, this analysis is only based on the commit history and does not also analyze development time, or differences in programmers. So the takeaway is that language features do matter, but programmers just write buggy code.

Wednesday, May 3, 2017

PhD Defense - Meeting Tail Latency SLOs in Shared Networked Storage

Today I went to Timothy Zhu's PhD thesis defense. His work is on achieving better sharing of data center resources to improve performance, and particularly to reduce tail latency. He also TA'd for me last fall.

Workloads are generally bursty, and their characteristics are different. Furthermore, they may have service level objectives (SLOs), and the system needs to meet these different objectives. And the system contains a variety of resources that must be shared in some form. It is not sufficient to just divide the bandwidth. Nor can the system measure the latency and try reacting, particularly as bursty workloads do not give sufficient time to react. While each workload has deadlines, it would be too complex to tag request packets with the deadlines for queuing and routing. However, the deadlines can be used to generate priorities for requests.

The system is architected to have storage and network enforcement components to ensure QoS. There is also a controller that receives an initial trace to characterize each workload, and that workload's SLOs. The controller works through a sequence of analyses to successfully place each workload into the overall system.

Effectively, each workload is assigned a "bucket" of tokens, where the bucket size provides the ability to handle bursts and the rate that tokens are added covers the request rate for the workload. Shorter burstier workloads receive large buckets and low rates, while constant workloads with little bursts have high rates and small buckets. In both cases, only when the bucket is empty, is the workload rate-limited in its requests, and these requests receive the lowest priority. Deterministic Network Calculus (DNC) to model the worst-case queue scenarios. This plots two curves: the requesting flow and the service curve, both plotted as tokens by function of window size (dt). The maximum distance between the curves is the maximum latency.

Using three traces: DisplayAds, MSN, and LiveMaps, they tested three approaches: Cake (reactive approach), earliest deadline first, and Timothy's scheme (PriorityMeister). His scheme did significantly better than the others at meeting the SLOs. However, the DNC analysis was based on achieving 100% and not the SLO's 99% (or other percentile success). Depending on the characteristics, there can be significant differences between these guarantees. To model the latency percentiles, Stochastic Network Calculus (SNC) can achieve this; however, the math is significantly more complex. And the math had not previously been applied to this problem. DNC is still better when assuming that bursts are correlated or the system is in an adversarial setting. Reducing these assumptions (uncorrelated workloads), the SNC-based analysis permitted the system to admit 3x workloads versus the DNC analysis.

Workloads have a curve of satisfying bucket sizes and token rate pairs. Many systems require the user to provide its rate limit. Other systems use simple heuristics to either find the "knee of the curve" or select a rate limit as a multiple of the average rate. However, for an individual workload, all pairs are satisfying, it is only when workloads are combined in a system do the different pairs matter. The configurations of the set of workloads on the system can be solved for using a system of linear equations. Therefore, when placing new workloads, the controlling architecture can find successful placements, while potentially reconfiguring the workloads assigned.

One extension would be addressing failure modes. Currently, the system is assumed to be at degraded performance when components have failed.

Monday, May 1, 2017

Book Review: Multicore and GPU Programming: An Integrated Approach

I wanted to like this book, Multicore and GPU Programming: An Integrated Approach, and be able to use it in the classroom. I teach a class where we cover OpenMP, MPI, Cilk, and Cuda. Yet, I would not use this book and I have told the publishers this as well, to which they acknowledged my reasoning.

First, the author elects to use QtThreads for the base parallel programming approach. He notes that he had considered using pthreads or C++11 thread support, and comments that he rejected C++11 threads for the book as the support was incomplete at the time. Pthreads is left without comment. That may be, but I have heard of and used those options, while QtThreads is something that I have never considered.

Second, the book serves as an admirable API reference. For one or a set of function calls, significant space is dedicated to how that call works and illustrating examples for it. Structured Parallel Programming also covers many APIs, yet it maintains a feel of being about parallel programming in general rather that the calls specifically. However, that work covers different API sets so the two are not explicitly comparable.

Third, and this issue is really more for the editors rather than the author, the typesetting on each page is poor. There is significant white space left bordering the text on each page. Furthermore, the code wraps, and often not for being long code, but for long comments and comments on the same line as the code. I understand that in programming these are stylistic choices; however, the impact of finding line wraps from long comments leaves the text looking unprofessional. I must assume that the author wrote the code separately and then provided for being included into the book, but the editor failed to make allowances for typesetting.

In conclusion, I wanted to like and use the book. Whenever I speak with the publisher, they always direct me to it, I just have to hope for something else to come along. Use it as a reference perhaps, but I am cautious in my recommendation.

(This book was provided free by the publisher to review for possible use in the classroom.)

Friday, April 21, 2017

Repost: What Makes a Program Elegant?

In a recent issue of the Communications of the ACM, there was a short article titled, What Makes a Program Elegant? I found it an interesting discussion that has summarized well the characteristics in elegant programming: minimality, accomplishment, modesty, and revelation. Revelation is one that I had not considered before, but I think it is most important of all. There are some code sequences that I have written, which the elegance has rested most of all on its revelation. Using and showing some aspect of computers and programming that I have never seen before, or revealing that there is a modest way to accomplish something new or differently.

Monday, March 13, 2017

Book Review: Optimized C++: Proven Techniques for Heightened Performance

I have spent significant time on performance issues and have been in search of a book that can summarize the diversity of issues and techniques well. I hoped that Optimized C++: Proven Techniques for Heightened Performance would provide some of the guidance I want and
This book is not quite it. There is good material here, yet I found repeatedly thinking that the author was not aware of the past 10(?) years of changes to the field. Not an issue of the book was from the early 2000s, but it was published last year.

A key step in improving the performance of programs is measuring it. There are a variety of techniques for doing so. Tools based on instrumentation and tools based on sampling profiling. I find greater value to using the sampling profiling tools (for measuring performance) due to their lower overhead and ability to pinpoint where in a function this cost exists. Yet the book's focus is limited to gprof-esque approaches. I tell students that this approach is best with deep call trees, which may be a greater issue for C++ programming specifically.

The author is somewhat dismissive to compiler optimizations and emphasizes that his observed benefit has been particularly limited to function inlining. There are many more optimizations, and you should care about them. But again, I wonder if his experience of C++ has been deep call trees that could particularly benefit from inlining.

In a take it or leave it, this work also discourages the use of dynamic libraries. Yes, they impose a performance penalty, but they also provide valuable functionality. It all depends on your use case for whether you should statically or dynamically link your code. Code that is reused by separate executables should be in a dynamic library, as it reduces the memory requirements when running and reduces the effort to patch and update those executables. Components that are only used by a single executable should be statically linked, unless the components are of significant size such that decoupling can still benefit memory usage and the updating process.

The author related that replacing printf with puts to just print a string has performance advantages, due to printf being a complicated "God function". The basic point is valid that printf has significant functionality; however, the anecdote should be taken with a grain of salt. Current compilers will do this optimization (replace printf with puts) automatically.

While most of the work provides small examples, the final chapters on concurrency (?) and memory management do not. The concurrency chapter reads as a reference book, as it lists the various APIs available and what each does. It would be better for the book to assume that the readers are familiar with these calls (as the author does with many other topics) and discuss possible optimizations within this scope.

To conclude, the book is not bad, but I also cannot say it is accurate on every point. Especially with performance, programmers are apt to make prompt design decisions based on "their experience" or "recent publications". Measure your code's performance. Only then can you discern which techniques will provide value.

Thursday, December 29, 2016

Book Review: The Art of Readable Code

I really enjoyed reading The Art of Readable Code. I always enjoy reading books on style, both English writing as well as source code. The earlier chapters were a greater highlight to me, as they focused on the basics of style and particularly were demonstrated through languages with which I am familiar. Some of the later chapters instead were examples particular to languages that I do not use, such as illustrating style pitfalls with JavaScript. The book also had value in showing several approaches and continuing to refactor the code to better meet style and readability.

While the topic is style, this is really more about fundamentally good practices, which may be implemented in one of several ways (e.g., where to put braces, camel case, etc) that is termed style. Between this and the examples within the text, I want to start requiring it of students. I want them to read and see why style actually matters. Or maybe we will just have to wait until they experience why it matters and suffer for it.

Monday, April 13, 2015

Book Review: Structured Parallel Programming

At SIGCSE this year, I spoke with several publishers about possible textbooks. Primarily, for ones that would work in the different classes that I might teach. As well as those that relate to my present research. In this case, I received a copy of Structured Parallel Programming: Patterns for Efficient Computation from the publisher to review and consider for possible future use in a course.

This work is in three parts: the basics of parallelism and performance, the common patterns in which parallelism is expressed, and example implementations of several algorithms. The second part is the core of the work. To show maps, reduce, scatter and gather, stencil, fork-join, and pipeline. But before we learned those details, we would come to key quotes for all that I do:

You cannot neglect the performance of serial code, hoping to make up the difference with parallelism.

And:

[The] performance of scalar processing is important; if it is slow it can end up dominating performance.

Therefore, parallelism is a method for improving the performance of already efficient code.

With both the common patterns, as well as the example implementations, the authors generally provide the source code for each pattern and implementation using Cilk, TBB, and OpenMP. This source is not for casual readers. More involved implementations can stretch for several pages, as the initial implementation and then subsequent refinements are explored. While it serves well as a reference, it may have worked better to focus on one parallelism approach for each section and therefore give further explanation to the code, especially the language features used. And thereby retain the pattern itself rather than becoming a practitioners' reference.

The example implementations (the third part) are perhaps the least interesting for the classroom and potentially the most interesting for practitioners. Clearly, if I was trying to write code similar to one of these problems, I would have an excellent reference and starting point. However, that is quite rarely the case for myself and I suspect most people as well.

If I was teach a parallel programming course, I might consider using this work (although I still have other, similar textbooks to review); however, were I to do so I would be confining my teaching to the first two parts and may even to just 1 parallel programming paradigm. Yes, I will admit that the last parallel programming course I took covered a diversity of paradigms (Cilk, vectorization, GPUs, OpenMP, MPI), yet I would have preferred to focus more on what one or two paradigms are capable of rather than just the taste of many. Parallel programming takes a lot of work to learn and this book is one piece in that effort.

Tuesday, March 10, 2015

Book Review: Geek Sublime: The Beauty of Code, the Code of Beauty

A book of such promise is Geek Sublime: The Beauty of Code, the Code of Beauty, yet it fails. The first third of this work followed the hopeful theme, intertwining the story of programming, its style, and the author's complex relationship with his Indian heritage, desire to be an artist (writer, etc), and his ability to make money programming. An interesting twining that continued to encourage me to read, yet some worrisome signs developed.

The first worry was skipping a long section on how computers work. This *is* my field of Computer Science, so I wasn't interested in reading the basics. Yet worse was noticing some inaccuracies. They established a certain level of understanding in Computer Science that was concerning, particularly in light of the author's non-technical background. Livable, er readable, sure.

It was then the author's intent to show the great intellectual contributions made by Indians. I have no dispute of this point, except that it wasn't actually fitting with the established theme of the work. Finding that Sanskrit has a codified language of great antiquity enabled the author in this quest. Alas, it was long pages that grew increasingly divorced with the first part of the title, "The Beauty of Code" and further focus on the later, "The Code of Beauty". And this code deriving from Indian literary traditions.

In the end, the book concluded. A crescendo of final claims that each read like next page would be the last, until such time as there was really no text left. I learned something about Indian culture by reading this book, except that was not why I read it. I did not gain in what I sought, so I cannot recommend reading it, nor care to provide a link to it.

Saturday, March 7, 2015

Conference Attendance SIGCSE 2015 - Day 2 / 3

I recognize that Day 1 afternoon went “missing”. I presented my poster and that consumed the sum total of my time. While I am happy with all that I achieved with my poster (writing IRB protocol, independent work, analyzing my teaching, et cetera), it was not considered as a finalist for the student research competition (SRC). Yet I received significant feedback and a number of follow-ons that I will have to try to evaluate the next time(s) I teach. I have been doing an excellent job of networking and speaking with my colleagues. And I have seen several exciting techniques to improve my teaching.

BOF2: Preparing Undergraduates to Make the Most of Attending CS Conferences

This was a small Bird of the Feather (BoF), but we discussed some approaches and things to know before attending your first conference, and not just for undergraduates. However, in some cases, undergraduates are attending their local institution and may never have traveled any significant distance. What do they really need to bring?

In traveling, take some time to prepare students. Let them know what to expect. For example, it is okay to miss some paper sessions, and even return to your room entirely. It is okay to ask questions 1:1. Find groups where people are being introduced and join in. Student volunteering, while takes time, also gives an additional individuals that you will know. Use the people you know to introduce you to others at the conference.

MIPSUnit: A Unit Testing Framework for MIPS Assembly

This is just what it sounds. A Ruby based framework that enables writing simple unit tests that will then be applied to a full simulation of the assembly executed.

A Web-based Simulator for Learning Multiprocessor Real-time Scheduling

The presenter(s) were not at this poster, but it showed a high quality interface for seeing the scheduling of threads according to different scheduling policies. The intent here was not to explore races and parallelism, but rather see how scheduling decisions are made in an OS.

Raspberry HadooPI: A Low-Cost, Hands-On Laboratory in Big Data and Analytics

I was not expecting this poster. You are walking along and then see 4 Raspberry Pi's all networked together. Raspberry Pis and HPC?! A small setup, but it is an interesting development that takes advantage of the low cost Pi and still provide an HPC platform for students.

Building the Pascaline: Digital Computing Like It's 1642

Plastic parts all worked together to form replicas of Pascal's mechanical calculator. Interesting and student assembled.

On the Evaluation of Student Team Software Development Projects

Teams of 4 students, approach is evaluated on courses from three years of major (sophomore on up). Teams are formed with CATME (particularly using dissimilar GPAs in a group), as well as partner selection (when possible). Students provide peer evaluations after each stage of the project. Significant data collection looking particularly at what students prefer for to be the evaluation policy (between 100% of grade for the group’s work to 100% of the grade for the individual’s contribution). This question was taken repeatedly throughout the semester, which leads to whether student preferences change? More senior students prefer more weight being attributed to group. The predictor for what grade split is at what point in the course is this surveyed, and effectively as soon as the teams are formed the students prefer to be graded primarily as a group. Follow on study is looking at experience with team projects, trust in the ability to evaluate individual contribution, and other questions. This is a hopeful data point.

What Influences CS Faculty to Adopt Teaching Practices?

How do faculty become aware and why do they try out teaching practices? 66 participants in CS, including chairs, tenure-track faculty, teaching faculty, and Ph.D. student instructors across 36 institutions. First, the mental model of what an instructor does can differ significantly from what the instructor is actually doing. Second, faculty can find out about practices through a variety of approaches, such as self-identifying that there is possible improvement in their teaching. Faculty often trust other faculty like them (researchers to researches, lecturers to lecturers). Third, when adopting a practice, faculty need to evaluate the effectiveness (see also my poster, student feedback, etc). -- My efforts in this have been having different faculty (my recommendation letter writers) view my lectures / teaching, and thereby giving them demonstrations of different practices.

Personalized Attention @ Scale

"We lost the war on cheating" Instead, we have to meet with students such that they are demonstrating their understanding of the code. The requirements of submissions: attribute your sources and understand your submission. Enables students to work together, use all sources, develop interview skills. Enables reuse of assignments. Grading is now 40% correctness / 60% code interview. Rubric for each interview. Students should arrive early and have their laptop ready to present / explain. Students were better able to learn and complete the assignments, as well as feedback for improvement. Students also felt better able to learn the material by being able to collaborate and not constrained by a collaboration policy. There are some stressors, such as TAs having to meet with hundreds of students, as well as their inconsistencies. -- This was perhaps the most exciting new technique that I saw / heard about.

Thursday, March 5, 2015

Conference Attendance SIGCSE 2015 - Day 1 Morning

It is colder here in Kansas City. Fortunately, I will only be outside briefly. Most often I will be networking and continuing my efforts to both become a better teacher, as well as finding an academic job teaching.

This morning, I am focusing on the "Curriculum" track. I am excited by the three papers in this track, the first looks at research, the second is on systems courses, and the last on parallel computing courses. Alas, I was in the hallway track and missed the first work. Perhaps I can find the authors later.

Backward Design: An Integrated Approach to a Systems Curriculum
The goal of systems is "higher level software creation". Computer Science courses are split into Core Tier 1 and Tier 2 (a term from the ACM 2013 curriculum), where the former are taken by all CS majors and the later are only taken by most or some. One issue in the old curriculum was that OS also taught C. In crafting a new curriculum, first establish a vision statement, which can be used in conflict resolution (and also revised). Establish SMART objectives to prepare and build the assessments. The results can be found on github.

A Module-based Approach to Adopting the 2013 ACM Curricular Recommendations on Parallel Computing
Parallel computing is important and important for CS graduates to know. The 2013 ACM Curriculum increased the number of hours that students should take in parallel computing. Part of the recommendations are to place parallel computing into the curriculum and not just as a course. Thus parallelism modules are placed throughout the curriculum (perhaps as early as CS1 or CS2). Find the level of abstraction for a concept and introduce it appropriately. For example, Amdahl's Law in CS1 versus cache coherence in senior-level class. 5 modules of parallelism were established, which have equivalences with the ACM. Each course in the curriculum may have 1 or more modules, which then teaches and reinforces the topics. Even after adding these modules, there has continued to be incremental development and revisions, which have improved student outcomes. The key take away is that it is possible to introduce these recommendations without completely rewriting the curriculum.

In the afternoon, I will be standing with my poster - Using Active Learning Techniques in Mixed Undergraduate / Graduate Courses. Later I will post updates from my afternoon.

Monday, July 14, 2014

Book Review: The Practice of Programming

(contains Amazon affiliate link)
I recently found, The Practice of Programming (Addison-Wesley Professional Computing Series), sitting on the shelf at my local library. I am generally skeptical when it comes to programming books, and particularly those from different decades, but I trusted the name "Brian Kernighan" so I checked the book out.

And I am so glad that I did. From the first chapter that discussed style, I wanted to read more. And the only reason to ever stop reading was to pull out a computer and put these things into practice. I didn't even mind that it wasn't until chapter 7 that performance was discussed. Still, I will readily acknowledge that I disagree with some of statements in the book. Furthermore, there are some parts of the text that are clearly dated, like discussing current C / C++ standards.

I'd like to conclude with a brief code snippet from the work. This code is part of a serializer / deserializer. Such routines are always a pain to write and particularly if you have many different classes / structs that need them. Thus the authors suggest using vargs and writing a single routine that can handle this for you. Here is the unpack (i.e., deserialize) routine:

/* unpack: unpack packed items from buf, return length */
int unpack(uchar *buf, char *fmt, ...)
{
    va_list args;
    char *p;
    uchar *bp, *pc;
    ushort *ps;
    ulong *pl;

    bp = buf;
    va_start(args, fmt);
    for (p = fmt; *p != '\0'; p++) {
        switch (*p) {
        case 'c': /* char */
            pc = va_arg(args, uchar*);
            *pc = *bp++;
            break;
         case 's': /* short */
             ps = va_arg(args, ushort*);
             *ps = *bp++ << 8;
             *ps |= *bp++;
             break;
         case 'l': /* long */
             pl = va_arg(args, ulong*);
             *pl = *bp++ << 24;
             *pl |= *bp++ << 16;
             *pl |= *bp++ << 8;
             *pl |= *bp++;
         default: /* illegal type character */
             va_end(args);
             return -1;
         }
     }
     va_end(args);
     return bp - buf;
}

So now we have a little language for describing the format of the data in the buffer. We invoke unpack with a string like "cscl" and pointers to store the char, short, char and long. Hah! That's it. Anytime we add new types, we just to call the pack / unpack.

Does it matter that the variables are only sequences like "pl" or "bp"? No. Variable names should be meaningful and consistent. "i" can be fine for a loop iterator.

We have given up some performance (*gasp*), but gained in the other parts that matter like readability and maintainability. I plan on using this in my current research (but only the unpack, as my serializers are already highly optimized). All in all, I approve of this book and may even someday require it as a textbook for students.

Tuesday, February 4, 2014

Book Review: ARM Assembly Language: Fundamentals and Techniques

Besides reading an average of 5 research papers every week, I also read an average of one book each week. Occasionally those books relate to computers and usually then I'll write about them here. I had realized a couple months ago that I didn't really know anything about ARM processors, besides that they are low power. It seemed remiss to be studying computer architecture and not know one of the modern architectures. Thus I visited my school library and checked out a book.

This is the story of that book - ARM Assembly Language: Fundamentals and Techniques. An interesting book that covered the basic of what I wanted to learn, but the short coming was that it had an expected environment that was different from mine. ARM processors can be found in a greater diversity of devices than say, x86. Yet, I am still thinking about the ARM processor as a drop-in replacement. I look more to devices like Microsoft's Surface or a smartphone, and think about the presence of an OS, etc.

I learned particularly that the ARM instructions have bits to make them predicated. And I realized then that conditional branches are really just predicated instructions. If the predicate(s) are true, then take the branch. Just another perspective on instruction sets. Anyway, I look forward to getting a Raspberry Pi, so I can try out some of what I've learned and get a chance to also work through the assembly generated by compilers.

Monday, September 23, 2013

Book Review: Turing's Cathedral

Recently, I read the work of history, Turing's Cathedral: The Origins of the Digital Universe (Vintage), which is an interesting book that tells of the development of some of the first computers in the United States. It's particular focus is on the founding of the Institute for Advanced Study (IAS), and then John von Neumann's time there. Now, the von Neumann architecture is something I regularly conceptualize and use in teaching. And it was interesting to read of how this architectural model was developed, and why.

In contrast, a significant portion of the book was instead written as a history of the Institute. Given that the Institute provided access to the records used as a significant part of the source material, it is understandable that the author's focus would be so directed. However, it adds to the misleading focus that this work follows.

Of perhaps greater slight is that a work titled "Turing's Cathedral" only features Alan Turing for a small part of the writing. Instead we find greater focus placed on his work and how it fit into the research of that time. Eventually leading von Neumann to explore the usage of electronic digital computers to solve the US military's problems. He, like many European scientists, had left his homeland ahead of Hitler, and these scientists supported work leading to Germany's defeat.

The grand development that von Neumann introduced was making a computer, programmable. Beyond just reconfigurable, the project he lead at the IAS was programmable, the electronic device could store both data as well as codes that were instructions for what the device was to do. Consider that for the next 50 years, programs would be constrained by having to store the instructions in memory, which was often a very limited resource.

So John von Neumann stars in a book titled for Turing, and a book that devotes a third of its pages to references. A good, interesting work that could probably have been improved by an editor's scissors. To trim the writing down to the core bits about computers, and set aside so much of the well researched chapters to attain a focus that is lacking.

Monday, July 22, 2013

Review: Patterns for Cache Optimizations on Multi-processor Machines (ParaPLoP '10)

In order to optimize any code, know that the optimizations "depend on the exact architecture of the machine (processor, memory, etc), the exact version of the compiler, the exact version of the operating system and the particular configuration of the program that [you] are trying to optimize." At minimum, you should establish what your workload is and how its performance is measured, and then have some gauge of what its current performance is (including the distribution of measurements). All of which is covered in Patterns for cache optimizations on multi-processor machines before they are touch on any optimization. And I applaud them for doing so.

In this work, they explore three patterns of cache (mis-)use on modern processors. The first pattern is termed "loop interchange", named for its solution. In this pattern, the program does not access data with spatial locality. Instead of accessing every element in a cache line, the program has a different ordering and only touches a subset of the cache line, while later (after the line has been evicted) it accesses other subsets. In the example below, assume that N and M are both quite large (say 1+ million), so this code will likely have significant cache misses (at minimum L1 misses), while switching the "i" and "j" for loops (i.e. interchange) will considerably reduce the number of cache misses.

int X[N][M];
for (j = 0; j < M; j++)
for (i = 0; i < N; i ++)
f(X[i][j]); // Any set of operations applied to this element.

The next pattern is false sharing. Threads in a program intentionally and unintentionally share data. Data structures are written by programmers to logically group data; however, the grouping and structuring of the data is often made by the programmer and not for algorithmic need. The hardware is expecting locality from arrays and data structures. When multithreaded, the cache line is the unit by which the hardware tracks sharing of data. So if different threads write to different data in the same cache line, then hardware treats the writes as being made to the same thing, which precludes it from caching. The usual recommendation for solving this problem is to pad the data, so that the software notion (int) and hardware notion (cache line) are the same size.

int X[N];
void* thread_work(int tid)
{
for (int i = 0; i < N; i++)
    if (i % num_threads == tid)
    X[i] = do_work(X[i]);
}

This second example goes beyond the paper's scope for false sharing. Common data structures may also have different sharing patterns for each element. For example in this data structure, the following fields are written to: encoding, sum, weight_left, and weight_right. The rest are read-only. Currently the data structure uses two cache lines (as all fields are 8-bytes in size).   If the structure was rearranged so that the written fields were in one cache line and the read-only fields in the second line, then updates by any thread would only result in one cache miss rather than two. Padding may be required, but the key insight here is arranging data by sharing pattern, which is a generalization of the previous paragraph.

typedef struct _node {
    graph_value_t value, encoding;
    unsigned long long sum;
    struct _edge* fwd;
    struct _edge* back;

    // tree sorted by value
    struct _node* left;
    struct _node* right;

    // tree sorted by encoding
    struct _node* weight_left;
    struct _node* weight_right;
} node, *pnode;

The final pattern explored in the paper is data alignment. Ignoring the issue of misaligned accesses, let's look at misaligned allocations. Suppose we allocate an array of 48-byte data structures in a multithreaded program. Sometimes accessing an element is one cache miss, but sometimes it is two. The runtime system has packed the data structures together, with 4 fitting in 3 cache lines. In general, when you allocate data structures, they come with the same alignment as in the array, made to a 16-byte boundary, but this boundary is not guaranteed to be the start of a cache line. The primary solution is to use support calls that change the allocation alignment. This may waste space, but now the allocation comes using our expected number of cache lines. And by using the lines we expect, we can tailor the program to the architecture and observe the expected performance characteristics.

The patterns are three simple ones that architects and performance minded programmers have known for years. I am pleased to see them being reiterated, but the response may be like that from the developer after I changed his code per these patterns years ago, "Why can't the compiler just do that for me?!"

Thursday, July 11, 2013

Book Review: Exceptional C++ Style 40 New Engineering Puzzles, ...

Generally, I do not write code in C++; however, on occasion (like when writing LLVM compiler passes), I am forced into using this language. I also more regularly find myself grading student assignments that have used C++. Particularly reading these assignments, I will be thankful to have read this book and better be able to express how students have violated the standards, as they have done in the past.

Were I forced to read a book directly on the C++ standards, let's just say I can think of lots of things I'd rather be doing. But while Exceptional C++ Style: 40 New Engineering Puzzles, Programming Problems, and Solutions exposed me to more of the standards, I never felt like I was reading quotes from the standard. Instead, I was listening to an interesting conversation about some real programming questions that just may require invoking the standards to answer.

I enjoyed chapters 20 and 21, as I appreciate the effort toward explaining how memory is allocated and structures laid out. Help dispel the ignorance that new / malloc are how the OS provides memory. And I then learned that new will throw an exception instead of returning NULL. Perhaps time to rewrite some code. Furthermore, I understand now why most C++ code uses preincrement on iterators.

It is not strictly a book on style, but instead this tome covers the style I care most about: good programming practices. I don't care which style convention you use, so long as you use one consistently. But for whatever style your code has, it had better be good code.

I recommend reading this book even if you do not regularly use C++. I will note that it is dated; however, unless you are now using C++11, the text is still timely and even if you are using C++11 I doubt that everything has changed (though I did notice the discussion of the auto keyword was out of date).

Friday, November 2, 2012

Computer Science Education: Student Conceptions

I'm sitting in Michael Hewner's thesis defense on "Student Conceptions About the Field of Computer Science". And I think there are some interesting points raised that are worth reflecting on. Let's start with summary questions.

Who would be considered a Computer Scientist?

How does a student choose his or her particular focus in Computer Science?

Do misconceptions about Computer Science affect the student's education?

Students have three conceptions about what Computer Science is, which were first derived from interviews and reaffirmed through a 100 student survey. The Theory-view (8%) is that Computer Science is mostly concerned with a theoretical view of computers, where the mathematical basis and understanding is dominant (although there may then exist a related field of Software Engineering). The Programming-view (41%) is that all Computer Science is about programs, where one is either analyzing the basis of or the direct work on programming. The Broad-view (27%) is that Computer Science is a giant umbrella of disciplines where computers are involved. A final 23% of the survey responses were without clear category, which may be due to the limitation of the original interviews. All conceptions view algorithms and data structures as a vital component to Computer Science.

Students use enjoyment of classes as the dominant metric of whether they have an affinity for the area. Ironically, one of the two dominant courses in my undergraduate education (15-213) was offered at 9am, which would commonly be viewed as an unpleasant time. Since students use enjoyment as their metric, students are not particularly affected by their misconceptions about what the course(s) contains.

Clearly then, the enjoyability of a course can then affect the taking of subsequent courses. Which in future work, there may be an exploration of whether course enjoyment effects (like scheduled time) has an effect on enrollment in follow-on courses in subsequent semesters. Furthermore, many students trust the curriculum as providing an adequate preparation for practicing Computer Science, which is to say that they are prepared as long as they satisfy the requirements regardless of any attempt to have a focus in their course selection. Should the curriculum then have unpleasant courses to force students into specializing?

For myself, I am a holder of the Programming-view (perhaps based on being a paid programmer), as I view Computer Science to be centered on programs and the act of programming. The field is directed toward understanding programs and how to program well. Computer Science is informed in part through Mathematics by providing a basis for understanding of algorithms and data structures, of which programs are fundamentally composed. Many fields, like Bio-Informatics, are related and rely on Computer Science, but are not Computer Science.

Wednesday, September 26, 2012

Book Review: Coders at Work

I thought Coders at Work: Reflections on the Craft of Programming was a much more enjoyable read than the previously reviewed, Masterminds of Programming. Primarily as there wasn't a focus on finding conflict, I could instead enjoy hearing about how each person came to learn programming / computer science and their experiences with working on projects both small and large. By the end of the book, I wanted to go write code on a project, which suggests it was far more inspirational than most books I read.

Notable quotes follow:

"As they say, it's easier to optimize correct code than to correct optimized code." - Joshua Bloch

Several interviewees quote Tony Hoare who said "There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult."

"Types are essentially assertions about a program." - Dan Ingalls

"In those days at Bell Labs the amenities were great - you could call a telephone number and get a transcription. You know, leave a message and say you want it written up and it'll appear the next day in your inbox as sheets of paper. And [my coworker], after we talked about the file system on the blackboard for a little while, picked up the phone, called a number, and read the blackboard into the phone. It came back and it was about as close to a design document as we got except that it had homonyms that you wouldn't believe." - Ken Thompson, on the project that became Unix

"[Code is beautiful that has] a simple straightforward solution to a problem; that has some intrinsic structure and obviousness about it that isn't obvious from the problem itself." - Fran Allen

"When I say, 'You don't get credit because the program works. We're going to the next level. Working programs are a given,' they say, 'Oh.'" - Bernie Cosell