Observer Effect and Measurement Bias in Performance Analysis ; CU-CS-1042-08

To evaluate an innovation in computer systems, performance analysts measure execution time or other metrics using one or more standard workloads. The performance analyst may carefully minimize the amount of measurement instrumentation, control the environment in which measurement takes place, and repeat each measurement multiple times. Finally, the performance analyst may use statistical techniques to characterize the data. Unfortunately, even with such a responsible approach, the collected data may be misleading. This paper shows how easy it is to produce poor (and thus misleading) data for computer systems due to observer effect and measurement bias. Observer effect occurs if data collection perturbs the behavior of the system. Measurement bias occurs when a particular environment in which the measurement takes place favors some configurations over others. This paper demonstrates that observer effect and measurement bias have significant impact on performance and can lead to incorrect conclusions. These effects are large enough to easily mislead a performance analyst. Nevertheless, in our literature survey of recent PACT, CGO, and PLDI papers we found that papers rarely acknowledged or used reliable techniques to avoid observer effect or measurement bias. We describe and demonstrate techniques that help a performance analyst identify situations when they have poor quality data. These techniques are based on causality analysis and statistics which natural and social sciences routinely use to avoid the observer effect and measurement bias.

[1]  Shirley Moore A Comparison of Counting and Sampling Modes of Using Performance Monitoring Hardware , 2002, International Conference on Computational Science.

[2]  Lieven Eeckhout,et al.  Statistically rigorous java performance evaluation , 2007, OOPSLA.

[3]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[4]  Amer Diwan,et al.  The DaCapo benchmarks: java benchmarking development and analysis , 2006, OOPSLA '06.

[5]  James R. Larus,et al.  Exploiting hardware performance counters with flow and context sensitive profiling , 1997, PLDI '97.

[6]  Allen D. Malony,et al.  Perturbation analysis of high level instrumentation for SPMD programs , 1993, PPOPP '93.

[7]  Brad Calder,et al.  A Loop Correlation Technique to Improve Performance Auditing , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[8]  Ronald G. Dreslinski,et al.  The M5 Simulator: Modeling Networked Systems , 2006, IEEE Micro.

[9]  J. Hintze,et al.  Violin plots : A box plot-density trace synergism , 1998 .

[10]  Patricia J. Teller,et al.  Just how accurate are performance counters? , 2001, Conference Proceedings of the 2001 IEEE International Performance, Computing, and Communications Conference (Cat. No.01CH37210).

[11]  Petr Tuma,et al.  Benchmark Precision and Random Initial State , 2005 .

[12]  Amer Diwan,et al.  Understanding the behavior of compiler optimizations , 2006, Softw. Pract. Exp..

[13]  Sam Kash Kachigan Statistical Analysis: An Interdisciplinary Introduction to Univariate & Multivariate Methods , 1986 .

[14]  Patricia J. Teller,et al.  Accuracy of Performance Monitoring Hardware , 2002 .

[15]  Doug Burger,et al.  Measuring Experimental Error in Microprocessor Simulation , 2001, ISCA 2001.

[16]  George C. Necula,et al.  CIL: Intermediate Language and Tools for Analysis and Transformation of C Programs , 2002, CC.