Wednesday, October 15, 2014

Richard Lipton - Knuth Prize (Practice) Talk

Richard J. Lipton is the 2014 winner of the Knuth Prize.  His talk today was a summary of his work, which lead to received the prize.  The talk proved to be a series of short anecdotes, which are difficult to capture, but I've copied down the highlights, as best as I can.

"Do problems have labels?"  For example, simulate a queue as two stacks, is this a theory problem or system problem?  At the time, the qualifying exams were split by problem types, so labeling it mattered for which exam contained it.  Faculty at Yale were 50/50 split on whether to mix the problem types and instead students would sit for several days of CS questions rather than a theory day, then a systems day, etc.

Finding a division, separator to a planar graph, in root time.  T(n) <= C*T(n/2)
In explaining the result to Knuth, while visiting Tarjan, who responded "You've ruined my lunch."  As the result destroyed the best known algorithms that were being written, at the time, in Vol. 4.

"Throw away comments are wrong"  Many introductions make inaccurate statements like "non-uniform cannot imply uniform".  There is the work of Karp-Lipton dealing with non-uniform circuits and the uniform nature of algorithms.  The proof was later handed out on tote-bags at CCC 2010.

"Think in High Dimensions"  Binary search in high dimensional space, still logarithmic in the number of elements.  For example, take a planar graph and split it by the intersections, each slab is linear and can be quickly searched.

"Learn New Tools"  Now, one tool is "Probabilistic method" published on June 28, 1974, which shortly thereafter was a Yale seminar.  "By an Elementary Calculation" means to Erdos to use Sterling's approximation, which in one case required taking the approximation to 7 places.  Before learning this method, had been asked about the problem of Extendible Hashing, and had no idea and put it out of mind.  Later asked about it again, and the problem solved easily (or perhaps two days of proofs).

"Guess Right" One problem in solving problems in the community is that we are guessing wrong.  "It is really hard to prove false statements."  Take the problem of detecting whether a sequence of a_nb_m has n = m?  Possible using a multi-pass scan with a probablistic FSM.  Can do with one-way (i.e., single pass)?

"Need a Trick" Solving a problem of vector addition, with fixed counters, with adding and subtracting (where cannot subtract from 0).  1 counter is decidable, 2 counters is not.  But if there is no test for whether the counter is 0.  Proved it takes EXPSPACE-hard.  Pair counters, so add is subtract and vice versa.

"My Favorite Two Results" - Proving that a a^-1 = 1, in long sequence (abaaaba^-1...) can be done in LOGSPACE.  Do so by replacing the a, b with matrices, then modulo prime.  Given the distributed law and applying in any order, prove that it always stops on any expression.

"Future" Old problems, yes.  But dream of finding proofs to math problems that use CS theory tricks.

Wednesday, September 17, 2014

Preparing for Academic Jobs

I went to a recent seminar about the preparation and practice of finding an academic job.  The following summarizes the answers given by the panelists, each of whom was giving his or her opinion.  The short version is that your letters of recommendation are key.  They are the summary of your skills and qualifications by your (future) peers.  The panelists are all research-oriented faculty, which may skew some of the opinions provided.  One quality resource on teaching jobs can be found here.

Most important things in a candidate:
- Publications (some in the right places)
- Letters (don't really lie)
- Fulfilling the needs of the department
- Put "top" school in middle of interview schedule, chance to work out mistakes but not be burned out
- Energized / excited about place
- In 1:1 with faculty, only discuss own research for half of time (~15min)
- Be formal (jacket, etc)
- Prep work with faculty letter writers (explain research, plans, etc)
- Ability to connect across areas (your own area will get you the interview, the other areas will get you the offer)
- Talent, passion, impact in research
- Have a set of questions for 1:1 time of "do you have any questions?"

Things to avoid:
- Wrong / bad job talk (did you target the right audience, and yet convey knowledge in subfield)
- Attitude (arrogance that job is yours, or desperation about finding a job)
- Two interviews in one week

Letters of Recommendations:
- Especially letters from externals
- Prepare a statement of contributions (what have you really done / achieved?)

Things to focus on:
- Take risks in your research
- Network and get your name out / aware of

Postdoc versus Second Tier:
- Find collaboration and mentoring in a postdoctoral position
- It depends

- Still except some quality research
- Your research talk is a demonstration of teaching

Deciding on schools to apply:
- Location
- Areas of Focus

- Find packages from previous applicants

Tuesday, September 16, 2014

Atomic Weapons in Programming

In parallel programming, most of the time the use of locks is good enough for the application.  And when it is not, then you may need to resort to atomic weapons.  While I can and have happily written my own lock implementations, its like the story of a lawyer redoing his kitchen himself.  It is not a good use of the lawyer's time unless he's enjoying it.

That said, I have had to use atomic weapons against a compiler.  The compiler happily reordered several memory operations in an unsafe way.  Using fence instructions, I was able to prevent this reordering, while not seeing fences in the resulting assembly.  I still wonder if there was some information I was not providing.

Regardless, the weapons are useful!  And I can thank the following presentation for illuminating me to the particular weapon that was needed, Atomic Weapons.  I have reviewed earlier work by Herb Sutter and he continues to garner my respect (not that he is aware), but nonetheless I suggest any low-level programmer be aware of the tools that are available, as well as the gremlins that lurk in these depths and might necessitate appropriate weaponry.

Monday, July 14, 2014

Book Review: The Practice of Programming

(contains Amazon affiliate link)
I recently found, The Practice of Programming (Addison-Wesley Professional Computing Series), sitting on the shelf at my local library.  I am generally skeptical when it comes to programming books, and particularly those from different decades, but I trusted the name "Brian Kernighan" so I checked the book out.

And I am so glad that I did.  From the first chapter that discussed style, I wanted to read more.  And the only reason to ever stop reading was to pull out a computer and put these things into practice.  I didn't even mind that it wasn't until chapter 7 that performance was discussed.  Still, I will readily acknowledge that I disagree with some of statements in the book.  Furthermore, there are some parts of the text that are clearly dated, like discussing current C / C++ standards.

I'd like to conclude with a brief code snippet from the work.  This code is part of a serializer / deserializer.  Such routines are always a pain to write and particularly if you have many different classes / structs that need them.  Thus the authors suggest using vargs and writing a single routine that can handle this for you.  Here is the unpack (i.e., deserialize) routine:

/* unpack: unpack packed items from buf, return length */
int unpack(uchar *buf, char *fmt, ...)
    va_list args;
    char *p;
    uchar *bp, *pc;
    ushort *ps;
    ulong *pl;

    bp = buf;
    va_start(args, fmt);
    for (p = fmt; *p != '\0'; p++) {
        switch (*p) {
        case 'c': /* char */
            pc = va_arg(args, uchar*);
            *pc = *bp++;
         case 's': /* short */
             ps = va_arg(args, ushort*);
             *ps = *bp++ << 8;
             *ps |= *bp++;
         case 'l': /* long */
             pl = va_arg(args, ulong*);
             *pl = *bp++ << 24;
             *pl |= *bp++ << 16;
             *pl |= *bp++ << 8;
             *pl |= *bp++;
         default: /* illegal type character */
             return -1;
     return bp - buf;

So now we have a little language for describing the format of the data in the buffer.  We invoke unpack with a string like "cscl" and pointers to store the char, short, char and long.  Hah!  That's it.  Anytime we add new types, we just to call the pack / unpack.

Does it matter that the variables are only sequences like "pl" or "bp"?  No.  Variable names should be meaningful and consistent.  "i" can be fine for a loop iterator.

We have given up some performance (*gasp*), but gained in the other parts that matter like readability and maintainability.  I plan on using this in my current research (but only the unpack, as my serializers are already highly optimized).  All in all, I approve of this book and may even someday require it as a textbook for students.

Thursday, April 3, 2014

The Information Technology Implications of the President's Intelligence Review Panel

Peter Swire gave a Thomas E. Noonan Distinguished Lecture, titled “The Information Technology Implications of the President's Intelligence Review Panel". An interesting talk based on his time last fall on the President's 5-person committee charged with reviewing the practices of the intelligence community, partially in response to Snowden's leaks. Many recommendations were made in their 300 page report, including the often cited statement "Section 215 is 'not essential'."

A major theme of the talk was the claim that the "half life of secrets is declining". At one time, something classified would stay that way for 25 or more years. There is now increasing probability that directly (through leaks) or indirectly (by inference in non-classified sources) a secret will be publicly disclosed. Decisions must now be made by the intelligence community in light of the fact that their actions will likely be revealed in this near future. 

Furthermore, there is a offense / defense tension to the gathering of intelligence. In the past, the discovery of a vulnerability in codes (e.g., encryption), etc would result in orders to change, orders that themselves would likely be undetected by potential foes. But how do you ensure that current systems remain secure, when most (90+%) are in the private sector. And clarify the tension where by e-commerce and dissent are weighed against intelligence gathering and military support (e.g., drones), and all dominated by cat videos. 

How does the United States resolve the tension of promoting a freedom agenda (use of Twitter, etc in undemocratic countries) and the need of surveillance against foreign and domestic foes? In the past, secrets and intelligence were the actions of nation-states. Often gathered on physically separate networks against the background of predominantly local communication. Now, the predominant threat is from individuals (i.e., terrorists) and operating in a backdrop of global communication.
Three final points:
  • Increased privacy protections for non-citizens regardless of locale (see PPD-29)
  • ACM/IETF Code of Ethics as relates to confidentiality and security
  • MLAT and the time scales of the treaty versus the internet
I take no stance beyond saying that I recognize that legitimate needs result in a tension and that I found the talk very interesting.

Tuesday, March 18, 2014

Turing Complete x86 Mov

Here is a small work that showed the power (or perhaps horror) of modern assembly languages. I've told people before that an instruction to "subtract and branch if negative" is sufficient for all programming.  But never would have I imagined that x86's mov is also sufficient.  Until you think about what Intel lets you do:
  • Set the instruction pointer
  • Compute arithmetic expressions via address calculation
  • Compare two values by using their corresponding locations
 I am skeptical about whether this could work, given aliasing.  Regardless, it is an interesting exercise and I applaud the author.  Now, I can only imagine one of these oddities legitimately arising in programming.  You have all been warned. :-)

Saturday, March 8, 2014

Conference Attendance SIGCSE 2014 - Day 3

Today the day will be in reverse. We'll start with papers and end with the invited speaker. I have met many attendees and even talked to some of them. Let's start with operating systems and programming languages. With the bonus theme of avoiding using Linux for presentations.

Teaching OS through code review. Unified grading workflow with git, the student submissions are viewed as diffs and the grading is via online code review. Most students preferred this system over past solutions and tools. The system also supported incremental reviews / checkpoints. The GradeBoard tool is built on review board and git.

Virtual graphics card in qemu for teaching device driver design. Graphics is selected such that students would clearly see the results. Providing a device through a virtual machine significantly reduced the difficulties for instructors as well as for students. Minimal time required to restore student "machines" when they break. Most students completed the project versus earlier versions based on kernel intercepts.

A programming language compiler compiler. Earlier versions of the class require teaching scheme before students could implement their interpreter / compiler. Now based on java, the tool plcc processes provided lexical and grammar files, so that students can then interface with the java classes. Plcc only supports LL1 languages. Students implement simple interpreted languages.

And then it was time to network again, i.e. the hallway session. This continues to be an interesting expense for an introvert, yet it is also the exponential networking exercise. After I know more people, then it is more likely that I find a group in which that I know someone and can meet others. I've made progress with knowing the participants in my "field". And having more inspiration for teaching is summer.