Wednesday, February 9, 2011

Compiler Intrinsics - The Secret Sauce

Compiler intrinsics are one of the things that C programmers are often unaware.  Sometimes they know that such a functionality should exist, but is suspected to be cloaked in assembly.  Other times, the intrinsic does something otherwise unimagined by the developer.  For this article, I will confine my notation and functionality to that provided by Visual Studio (see MSDN); however, I have every expectation that other platforms / compilers will operate in a similar manner.

The intrinsic provides two things to a programmer.  First, compiler supported functionality might otherwise be unavailable.  Second and related, the compiler has understanding of what each intrinsic does.  The intrinsics are of three flavors: obscured functionality, replicated functionality, and compiler insights.  And in discussing these flavors, I think the provisions will be better understood.

Architectures provide many instructions that are not immediately accessible to the average programmer.  A simple example is counting the bits in a variable.  Such functionality is quite simple for hardware and already provided by the processor, yet I am aware of no language that has notation for such an operation.  Instead, a programmer can invoke __popcnt(var) and be done.  Invoking this intrinsic provides two benefits: first, performance and second correctness.  On Stack Overflow, the equivalent C implementation is suggested as:


int NumberOfSetBits(int i)
{
    i = i - ((i >> 1) & 0x55555555);
    i = (i & 0x33333333) + ((i >> 2) & 0x33333333);
    return ((i + (i >> 4) & 0xF0F0F0F) * 0x1010101) >> 24;
}

Replicated functionality steps up one level in complexity.  These operations may not have simple assembly equivalents, yet there still may be advantages from using the intrinsics.  An example is memcpy (in a prior job, I worked extensively to improve the implementation of memcpy so this is particularly dear).  There is a version of memcpy in the C runtime library, which is a good general purpose implementation.  Yet, if the program is not copying variable length data, perhaps inlining the call can provide benefit.  Beyond just avoiding the function call, having a fixed copy size also enables the compiler to generate instructions without the loops, etc that are in the memcpy implementation.  By utilizing the memcpy intrinsic, the compiler can do all of this, whereas without the intrinsic the linker is limited to linking in the function from the separate library.

Many times I've heard someone cite the halting problem and correctly assert that a compiler cannot fully intuit the operation of a program.  Beyond this, even limited analysis of source code is a difficult problem.  To this end, there exist a limited set of intrinsics that provide the compiler with further information regarding program state.  In particular, there is __assume.  This intrinsic posits that some expression is true, which can result in further optimization of the subsequent code.  It is recommended that this intrinsic is paired with asserts, as follows.

#ifdef DEBUG
#define INVARIANT(e) assert(e)
#else
#define INVARIANT(e) __assume(e)
#endif

Compiler intrinsics provide the programmer the opportunity to improve application performance by both simplifying algorithms with better hardware support and by guiding the compiler with a greater understanding of the code's operation.

EDIT: The intrinsics are also termed "builtins" for GCC, see here.

No comments: