Monday, February 23, 2015

Compilers and Optimizations, should you care?

Compiler optimizations matter.  One time in helping a fellow Ph.D. student improve a simulator's performance, I did two things: first, I replaced an expensive data structure with a more efficient one.  Second, I turned on compiler optimizations.  Together, the simulator ran 100x faster.

A question posted in the stackexchange system asked, "Why are there so few C compilers?"  The main answer pointed out that any C compiler needs to be optimizing.  Lots of optimizations are occurring on every compilation, and each one gaining tiniest increments in performance.  While I enjoy discussing them in detail, I generally wave my hands and tell of how they are good, yet make debugging difficult.  These optimizations are lumped together as the "optimization level".

In "What Every Programmer Should Know About Compiler Optimizations", we revisit optimizations.  First, the compiler is no panacea and cannot correct for inefficient algorithms or poor data structure choices (although I am party to research on the later).  The article then suggests four points to assist the compiler in its efforts at optimizing the code. 

"Write understandable, maintainable code."  Please do this!  Usually, the expensive resource is the programmer.  So the first optimization step is to improve the programmer's efficiency with the source code.  Remember Review: Performance Anti-patterns and do not start optimizing the code until you know what is slow.

"Use compiler directives."  Scary.  Excepting the inline directive, I have used these less than a half dozen times in almost as many years of performance work.  Furthermore, the example of changing the calling convention is less relevant in 64-bit space where most conventions have been made irrelevant.

"Use compiler-intrinsic functions."  (see Compiler Intrinsics the Secret Sauce)  This can often dovetail with the first point by removing ugly bit twiddling code and putting in clean function calls.

"Use profile-guided optimization (PGO)."  This optimization is based on the dynamic behavior of the program.  Meaning that if you take a profile of the program doing X, and later the program does Y; executing Y can be slower.  The key is picking good, representative examples of the program's execution.

So you have dialed up the optimization level, and written understandable code sprinkled with intrinsics.  Now what?  The next step (with which I agree) is to use link time optimizations (LTO) / Link-Time Code Generation (LTCG).  This flag delays many optimizations until the entire program is available to be linked.  One of the principles of software performance is that the more of the program available to be optimized, the better it can be optimized.  (This principle also applies in computer architecture).  Thus, by delaying many optimization until the entire program is available, the linker can find additional and better opportunities than were present in individual components.

The article notes, "The only reason not to use LTCG is when you want to distribute the resulting object and library files."  And alas, I have fought several battles to overcome this point, as my work requires the use of LTO.  Perhaps in the next decade, LTO will be standard.

Wednesday, February 18, 2015

Going ARM (in a box)

ARM, that exciting architecture, is ever more available for home development.  At first, I was intrigued by the low price points of a Raspberry Pi.  And I have one, yet I felt the difficulties of three things: the ARMv6 ISA, the single core, and 512MB of RAM.  For my purposes, NVIDIA's development board served far better.  At present, that board is no longer available on Amazon; however, I have heard rumors of a 64-bit design being released soon.

With the release of the Raspberry Pi 2, many of my concerns have been allayed.  I am also intrigued by the possibility it offers of running Windows 10.