Wednesday, January 19, 2011

Review: Measurement Bias (ASPLOS'09)

Given that I read 150 - 200 research papers a year, it seems only reasonable that I point out papers that I find particularly interesting.  One of the first papers that I read in grad school (and still one of the most interesting) is the subject of this post, Producing wrong data without doing anything obviously wrong!

We know of many sources of variance in our measurements, like whether other applications or processing is occurring.  Or second order items like, where is the data laid out on disk (inner versus outer tracks) or what are the specific pages of memory allocated (as it can influence caching)? But these variations are (usually) different from run to run, so by taking many measurements we can see an accurate performance where the events above occur with some frequency.

The paper tests the following comparison: what benefit does -O3 provide over -O2 in gcc?  Beyond the variations above, what items may affect performance of which we aren't aware, particularly those that don't vary over runs.  The danger is that these artifacts can result in this "wrong data" without our knowing it.  Two artifacts are analyzed in the paper: linking order and environment size.  Taking these in order.

The authors found that changing the order that the libraries are linked in the applications showed a performance variation of 15%.  On further analysis, they found that certain performance critical sections of code would have different alignments depending on the linking order.  Sometimes the code would be in one cache line, other times two.  This bias persists in both gcc and ICC (Intel's compiler).

Environment size also has unpredictable effects on application performance.  On the UNIX systems tested, environment variables are loaded onto the stack before the call into the application's main() function.  As the variables increase in size, the alignment of the stack changes and this causes performance effects throughout the code.  Some applications have minimal effects, others will vary by +/- 10%.

While these are but two cautionary tales of how small changes to program state can have significant performance impacts, the main take-away is that these effects will always be present.  But they call us to action in preparing more diverse workloads that can address these biases, like conducting multiple runs address interrupts, context switches, etc.

No comments: