Moving Beyond the Mean: Analyzing Variance in Software Engineering Experiments

Software Engineering (SE) experiments are traditionally analyzed with statistical tests (e.g., t-tests, ANOVAs, etc.) that assume equally spread data across groups (i.e., the homogeneity of variances assumption). Differences across groups’ variances in SE are not seen as an opportunity to gain insights on technology performance, but instead, as a hindrance to analyze the data. We have studied the role of variance in mature experimental disciplines such as medicine. We illustrate the extent to which variance may inform on technology performance by means of simulation. We analyze a real-life industrial experiment on Test-Driven Development (TDD) where variance may impact technology desirability. Evaluating the performance of technologies just based on means—as traditionally done in SE—may be misleading. Technologies that make developers obtain similar performance (i.e., technologies with smaller variances) may be more suitable if the aim is minimizing the risk of adopting them in real practice.

[1]  S. Simpson,et al.  Dietary restriction increases variability in longevity , 2017, Biology Letters.

[2]  Jacob Cohen,et al.  The earth is round , 1972 .

[3]  Jennifer J. Richler,et al.  Effect size estimates: current use, calculations, and interpretation. , 2012, Journal of experimental psychology. General.

[4]  Andy P. Field,et al.  Discovering Statistics Using Ibm Spss Statistics , 2017 .

[5]  G. Cumming Understanding the New Statistics: Effect Sizes, Confidence Intervals, and Meta-Analysis , 2011 .

[6]  Alistair M. Senior,et al.  Meta-analysis of variance: an illustration comparing the effects of two dietary interventions on variability in weight , 2016, Evolution, medicine, and public health.

[7]  Morton B. Brown,et al.  Robust Tests for the Equality of Variances , 1974 .

[8]  Andrew J Vickers,et al.  Parametric versus non-parametric statistics in the analysis of randomized trials with non-normally distributed data , 2005, BMC medical research methodology.

[9]  Tore Dybå,et al.  A systematic review of statistical power in software engineering experiments , 2006, Inf. Softw. Technol..

[10]  G. Norman Likert scales, levels of measurement and the “laws” of statistics , 2010, Advances in health sciences education : theory and practice.

[11]  Natalia Juristo Juzgado,et al.  Basics of Software Engineering Experimentation , 2010, Springer US.

[12]  Claes Wohlin,et al.  Experimentation in Software Engineering , 2000, The Kluwer International Series in Software Engineering.

[13]  L. Hedges,et al.  Introduction to Meta‐Analysis , 2009, International Coaching Psychology Review.

[14]  Pearl Brereton,et al.  Robust Statistical Methods for Empirical Software Engineering , 2017, Empirical Software Engineering.

[15]  Oscar Dieste,et al.  Comparative analysis of meta-analysis methods: When to use which? , 2011 .

[16]  Marco Torchiano,et al.  On the effectiveness of the test-first approach to programming , 2005, IEEE Transactions on Software Engineering.

[17]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[18]  Kerrie Mengersen,et al.  Meta‐analysis of variation: ecological and evolutionary applications and beyond , 2015 .

[19]  Jacob Cohen The earth is round (p < .05) , 1994 .

[20]  Torrin M. Liddell,et al.  The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective , 2016, Psychonomic bulletin & review.

[21]  Richard J McManus,et al.  Blood pressure variability and cardiovascular disease: systematic review and meta-analysis , 2016, British Medical Journal.

[22]  Feng Li,et al.  An Introduction to Metaanalysis , 2005 .

[23]  David J. Sheskin,et al.  Handbook of Parametric and Nonparametric Statistical Procedures , 1997 .

[24]  J. Reid Experimental Design and Data Analysis for Biologists , 2003 .

[25]  D. Bates,et al.  Fitting Linear Mixed-Effects Models Using lme4 , 2014, 1406.5823.