A study of performance variations in the Mozilla Firefox web browser

In order to evaluate software performance and find regressions, many developers use automated performance tests. However, the test results often contain a certain amount of noise that is not caused by actual performance changes in the programs. They are instead caused by external factors like operating system decisions or unexpected non-determinisms inside the programs. This makes interpreting the test results difficult since results that differ from previous results cannot easily be attributed to either genuine changes or noise. In this paper we present an analysis of a subset of the various factors that are likely to contribute to this noise using the Mozilla Firefox browser as an example. In addition we present a statistical technique for identifying outliers in Mozilla's automatic testing framework. Our results show that a significant amount of noise is caused by memory randomization and other external factors, that there is variance in Firefox internals that does not seem to be correlated with test result variance, and that our suggested statistical forecasting technique can give more reliable detection of genuine performance changes than the one currently in use by Mozilla.

[1]  C. Holt Author's retrospective on ‘Forecasting seasonals and trends by exponentially weighted moving averages’ , 2004 .

[2]  Hovav Shacham,et al.  On the effectiveness of address-space randomization , 2004, CCS '04.

[3]  Dayong Gu,et al.  Code Layout as a Source of Noise in JVM Performance , 2005, Stud. Inform. Univ..

[4]  N. S. Barnett,et al.  Private communication , 1969 .

[5]  H. Levene Robust tests for equality of variances , 1961 .

[6]  Lieven Eeckhout,et al.  Statistically rigorous java performance evaluation , 2007, OOPSLA.

[7]  Ulrich Drepper,et al.  What Every Programmer Should Know About Memory , 2007 .

[8]  David A. Wood,et al.  Variability in architectural simulations of multi-threaded workloads , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[9]  Dan Tsafrir,et al.  Reducing Performance Evaluation Sensitivity and Variability by Input Shaking , 2007, 2007 15th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems.

[10]  J. Rodgers,et al.  Thirteen ways to look at the correlation coefficient , 1988 .

[11]  Petr Tuma,et al.  Benchmark Precision and Random Initial State , 2005 .

[12]  Matthias Hauswirth,et al.  Producing wrong data without doing anything obviously wrong! , 2009, ASPLOS.

[13]  C. Chatfield,et al.  Prediction intervals for the Holt-Winters forecasting procedure , 1990 .

[14]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .