论文信息 - The Speedup Test

The Speedup Test

Numerous code optimisation methods are usually experimented by doing multiple observations of the initial and the optimised executions times in order to declare a speedup. Even with fixed input and execution environment, programs executions times vary in general. So hence different kinds of speedups may be reported: the speedup of the average execution time, the speedup of the minimal execution time, the speedup of the median, etc. Many published speedups in the literature are observations of a set of experiments. In order to improve the reproducibility of the experimental results, this technical report presents a rigorous statistical methodology regarding program performance analysis. We rely on well known statistical tests (Shapiro-wilk's test, Fisher's F-test, Student's t-test, Kolmogorov-Smirnov's test, Wilcoxon-Mann-Whitney's test) to study if the observed speedups are statistically significant or not. By fixing $0 Y}>\frac{1}{2}$, the probability that an individual execution of the optimised code is faster than the individual execution of the initial code. Our methodology defines a consistent improvement compared to the usual performance analysis method in high performance computing as in \cite{Jain:1991:ACS,lilja:book}. We explain in each situation what are the hypothesis that must be checked to declare a correct risk level for the statistics. The Speedup-Test protocol certifying the observed speedups with rigorous statistics is implemented and distributed as an open source tool based on R software.

[1] Douglas A. Wolfe,et al. Nonparametric Statistical Methods , 1973 .

[2] Philip H. Ramsey. Nonparametric Statistical Methods , 1974, Technometrics.

[3] Raj Jain,et al. The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.

[4] Denis Barthou,et al. Study of Variations of Native Program Execution Times on Multi-Core Architectures , 2010, 2010 International Conference on Complex, Intelligent and Software Intensive Systems.

[5] Sid Touati. Towards a Statistical Methodology to Evaluate Program Speedups and their Optimisation Techniques , 2009, ArXiv.

[6] Olivier Temam,et al. MicroLib: A Case for the Quantitative Comparison of Micro-Architecture Mechanisms , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[7] Lieven Eeckhout,et al. Statistically rigorous java performance evaluation , 2007, OOPSLA.

[8] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .

[9] L. Brown,et al. Interval Estimation for a Binomial Proportion , 2001 .

[10] David J. Groggel,et al. Practical Nonparametric Statistics , 2000, Technometrics.

[11] R. F.,et al. Mathematical Statistics , 1944, Nature.

[12] William Feller,et al. An Introduction to Probability Theory and Its Applications , 1967 .

[13] Matthias Hauswirth,et al. Producing wrong data without doing anything obviously wrong! , 2009, ASPLOS.

[14] David J. Lilja,et al. Measuring computer performance : A practitioner's guide , 2000 .

[15] D. vanDantzig. On the consistency and the power of wilcoxon's two sample test : (proceedings knaw series a, _5_4(1951), nr 1, indagationes mathematicae, _1_3(1951), p 1-8) , 1951 .

[16] Richard A. Davis,et al. Introduction to time series and forecasting , 1998 .

[17] H. B. Mann,et al. On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[18] Gilbert Saporta,et al. Probabilités, Analyse des données et statistique , 1991 .