Monday, March 13, 2017

Book Review: Optimized C++: Proven Techniques for Heightened Performance

I have spent significant time on performance issues and have been in search of a book that can summarize the diversity of issues and techniques well.  I hoped that Optimized C++: Proven Techniques for Heightened Performance would provide some of the guidance I want and
This book is not quite it.  There is good material here, yet I found repeatedly thinking that the author was not aware of the past 10(?) years of changes to the field.  Not an issue of the book was from the early 2000s, but it was published last year.

A key step in improving the performance of programs is measuring it.  There are a variety of techniques for doing so.  Tools based on instrumentation and tools based on sampling profiling.  I find greater value to using the sampling profiling tools (for measuring performance) due to their lower overhead and ability to pinpoint where in a function this cost exists.  Yet the book's focus is limited to gprof-esque approaches.  I tell students that this approach is best with deep call trees, which may be a greater issue for C++ programming specifically.

The author is somewhat dismissive to compiler optimizations and emphasizes that his observed benefit has been particularly limited to function inlining.  There are many more optimizations, and you should care about them.  But again, I wonder if his experience of C++ has been deep call trees that could particularly benefit from inlining.

In a take it or leave it, this work also discourages the use of dynamic libraries.  Yes, they impose a performance penalty, but they also provide valuable functionality.  It all depends on your use case for whether you should statically or dynamically link your code.  Code that is reused by separate executables should be in a dynamic library, as it reduces the memory requirements when running and reduces the effort to patch and update those executables.  Components that are only used by a single executable should be statically linked, unless the components are of significant size such that decoupling can still benefit memory usage and the updating process.

The author related that replacing printf with puts to just print a string has performance advantages, due to printf being a complicated "God function".  The basic point is valid that printf has significant functionality; however, the anecdote should be taken with a grain of salt.  Current compilers will do this optimization (replace printf with puts) automatically.

While most of the work provides small examples, the final chapters on concurrency (?) and memory management do not.  The concurrency chapter reads as a reference book, as it lists the various APIs available and what each does.  It would be better for the book to assume that the readers are familiar with these calls (as the author does with many other topics) and discuss possible optimizations within this scope.

To conclude, the book is not bad, but I also cannot say it is accurate on every point.  Especially with performance, programmers are apt to make prompt design decisions based on "their experience" or "recent publications".  Measure your code's performance.  Only then can you discern which techniques will provide value.

Saturday, March 11, 2017

Conference Time: SIGCSE 2017 - Day 2

I started my morning by attending my regular POGIL session.  I like the technique and using it in the classroom.  However, I should probably make the transition, attend the (all / multi-day) workshop, and perhaps get one of those "ask me about POGIL" pins.

Lunch was then kindly provided by the CRA for all teaching-track faculty in attendance.  There is the start of an effort to ultimately prepare a memo to departments for how to best support / utilize us (including me).  One thing for me is the recognition of how to evaluate the quality of teaching / learning.

Micro-Classes: A Structure for Improving Student Experience in Large Classes - How can we provide the personal interactions that are valuable, which enrollments are large / increasing?  We have a resource that is scaling - the students.  The class is partitioned into microclasses, where there is clear physical separation in the lecture room.  And each microclass has a dedicated TA / tutor.  Did this work in an advanced (soph/ junior) class on data structures?

Even though the same instructor taught both the micro and the control class, the students reported higher scores for the instructor for preparedness, concern for students, etc.  Yet, there was no statistical difference in learning (as measured by grades).

Impact of Class Size on Student Evaluations for Traditional and Peer Instruction Classrooms - How can we compare the effectiveness of peer instruction being using in courses of varying class sizes?  For dozens of courses, the evaluation scores for PI and non-PI classes were compared.  There was a statistical difference between the two sets and particularly for evaluating the course and instructor.  This difference exists even when splitting by course.  This difference does not stem from frequency of course, nor the role of the instructor (teaching, tenure, etc).

Thursday, March 9, 2017

Conference Attendance SIGCSE 2017 - Day 1

Here in Seattle, where I used to live, attending SIGCSE 2017.

Exposed! CS Faculty Caught Lecturing in Public: A Survey of Instructional Practices - Postsecondary Instructional Practices Survey (24 items), 7000 CS faculty invited, about 800 responses. If the evidence is clear that active-learning is better for instruction, then we should be doing that more. The overall split for CS was equal between student-centered and instructor-centered (exactly same avearge, 61.5). The survey showed clear differences between non-STEM (student) and STEM (instructor). So CS is doing better than its overall group.

Now, to dig into which differences there are in the demographics. The major difference in instructors is women, and those with 15 years of experience versus 30, both showing a 5+ point difference between student and instructor centered. However, 60s are still "whatever" and are not strongly committed. For those who are strongly committed, there are about 20% for each, while the remaining 60% are whatevers.

Investigating Student Plagiarism Patterns and Correlations to Grades - What are some of the patterns of the plagiarism, such as parts or all and how do students try to obfuscate their "work". Data from 2400 students taking a sophomore-level data structure course. After discarding those assignments with insufficient solution space, four assignments remained from six semesters. Used a plagiarism detector, to find likely cases of cheating.

First, even though the assignments remained unchanged, the rate of cases stayed constant. Most cases involved work from prior semesters. About two thirds of students who cheated, did so on only one assignment. Second, the rate of cheating on the individual assignments was similar to the partner assignment. Third, while students who cheated did better on those assignments, but they did not receive perfect scores and that those cheating did worse in the course than those who did not. And that those who took the follow-on course showed a larger grade difference (p=0.00019). Fourth, the analysis used the raw gradebook data that is independent of the detection and result of that detection.

Six detectors used. Lazy detector (common-case, no comments or whitespace), Token-based (all names become generic, sort functions by token length): identical token stream, modified token edit distance, and inverted token index (compute 12-grams and inversely weight how common these are). "Weird variable name" (lowercase, removed underscores). Obfuscation detector (all on one line, long variable names, etc). Fraction of total cases found by each detector: 15.69%, 18.49%, 49.71%, 72.77%, 67.35%, 0.38%.