The Speedup Test

Numerous code optimisation methods are usually experimented by doing multiple observations of the initial and the optimised executions times in order to declare a speedup. Even with fixed input and execution environment, programs executions times vary in general. So hence different kinds of speedups may be reported: the speedup of the average execution time, the speedup of the minimal execution time, the speedup of the median, etc. Many published speedups in the literature are observations of a set of experiments. In order to improve the reproducibility of the experimental results, this technical report presents a rigorous statistical methodology regarding program performance analysis. We rely on well known statistical tests (Shapiro-wilk's test, Fisher's F-test, Student's t-test, Kolmogorov-Smirnov's test, Wilcoxon-Mann-Whitney's test) to study if the observed speedups are statistically significant or not. By fixing $0 Y}>\frac{1}{2}$, the probability that an individual execution of the optimised code is faster than the individual execution of the initial code. Our methodology defines a consistent improvement compared to the usual performance analysis method in high performance computing as in \cite{Jain:1991:ACS,lilja:book}. We explain in each situation what are the hypothesis that must be checked to declare a correct risk level for the statistics. The Speedup-Test protocol certifying the observed speedups with rigorous statistics is implemented and distributed as an open source tool based on R software.

[1]  Douglas A. Wolfe,et al.  Nonparametric Statistical Methods , 1973 .

[2]  Philip H. Ramsey Nonparametric Statistical Methods , 1974, Technometrics.

[3]  Raj Jain,et al.  The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.

[4]  Denis Barthou,et al.  Study of Variations of Native Program Execution Times on Multi-Core Architectures , 2010, 2010 International Conference on Complex, Intelligent and Software Intensive Systems.

[5]  Sid Touati Towards a Statistical Methodology to Evaluate Program Speedups and their Optimisation Techniques , 2009, ArXiv.

[6]  Olivier Temam,et al.  MicroLib: A Case for the Quantitative Comparison of Micro-Architecture Mechanisms , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[7]  Lieven Eeckhout,et al.  Statistically rigorous java performance evaluation , 2007, OOPSLA.

[8]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[9]  L. Brown,et al.  Interval Estimation for a Binomial Proportion , 2001 .

[10]  David J. Groggel,et al.  Practical Nonparametric Statistics , 2000, Technometrics.

[11]  R. F.,et al.  Mathematical Statistics , 1944, Nature.

[12]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1967 .

[13]  Matthias Hauswirth,et al.  Producing wrong data without doing anything obviously wrong! , 2009, ASPLOS.

[14]  David J. Lilja,et al.  Measuring computer performance : A practitioner's guide , 2000 .

[15]  D. vanDantzig On the consistency and the power of wilcoxon's two sample test : (proceedings knaw series a, _5_4(1951), nr 1, indagationes mathematicae, _1_3(1951), p 1-8) , 1951 .

[16]  Richard A. Davis,et al.  Introduction to time series and forecasting , 1998 .

[17]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[18]  Gilbert Saporta,et al.  Probabilités, Analyse des données et statistique , 1991 .