Bayesian Statistics in Software Engineering: Practical Guide and Case Studies

Statistics comes in two main flavors: frequentist and Bayesian. For historical and technical reasons, frequentist statistics has dominated data analysis in the past; but Bayesian statistics is making a comeback at the forefront of science. In this paper, we give a practical overview of Bayesian statistics and illustrate its main advantages over frequentist statistics for the kinds of analyses that are common in empirical software engineering, where frequentist statistics still is standard. We also apply Bayesian statistics to empirical data from previous research investigating agile vs. structured development processes, the performance of programming languages, and random testing of object-oriented programs. In addition to being case studies demonstrating how Bayesian analysis can be applied in practice, they provide insights beyond the results in the original publications (which used frequentist statistics), thus showing the practical value brought by Bayesian statistics.

[1]  Susan A. Murphy,et al.  Monographs on statistics and applied probability , 1990 .

[2]  Walter F. Tichy,et al.  Case study: extreme programming in a university environment , 2001, Proceedings of the 23rd International Conference on Software Engineering. ICSE 2001.

[3]  Claes Wohlin,et al.  Experimentation in Software Engineering , 2000, The Kluwer International Series in Software Engineering.

[4]  Capers Jones,et al.  Function points as a universal software metric , 2013, SOEN.

[5]  Mauro Pezzè,et al.  Proceedings of the 38th International Conference on Software Engineering , 2016, ICSE.

[6]  35th International Conference on Software Engineering, ICSE '13, San Francisco, CA, USA, May 18-26, 2013 , 2013, ICSE.

[7]  David Barber,et al.  Bayesian reasoning and machine learning , 2012 .

[8]  H. Hulkko,et al.  A multiple case study on the impact of pair programming on product quality , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[9]  Alberto Bacchelli,et al.  Expectations, outcomes, and challenges of modern code review , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[10]  Tim Menzies,et al.  Perspectives on Data Science for Software Engineering , 2016, Perspectives on Data Science for Software Engineering.

[11]  Yves Le Traon,et al.  Trivial Compiler Equivalence: A Large Scale Empirical Study of a Simple, Fast and Effective Equivalent Mutant Detection Technique , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[12]  Harald C. Gall,et al.  The impact of test case summaries on bug fixing performance: an empirical investigation , 2016, PeerJ Prepr..

[13]  Irving John Good,et al.  Explicativity, corroboration, and the relative odds of hypotheses , 1975, Synthese.

[14]  Natalia Juristo Juzgado,et al.  Reporting experiments to satisfy professionals’ information needs , 2014, Empirical Software Engineering.

[15]  Miryung Kim,et al.  The Emerging Role of Data Scientists on Software Development Teams , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[16]  Matthew P. Wand,et al.  Kernel Smoothing , 1995 .

[17]  N. Lazar,et al.  The ASA Statement on p-Values: Context, Process, and Purpose , 2016 .

[18]  Lu Zhang,et al.  An Empirical Comparison of Compiler Testing Techniques , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[19]  Gabriele Bavota,et al.  An empirical study on the developers' perception of software coupling , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[20]  Nachiappan Nagappan,et al.  Evaluating the efficacy of test-driven development: industrial case studies , 2006, ISESE '06.

[21]  Michael Pradel,et al.  Performance Issues and Optimizations in JavaScript: An Empirical Study , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[22]  Krzysztof Czarnecki,et al.  Mining configuration constraints: static analyses and empirical results , 2014, ICSE.

[23]  Marian Petre,et al.  UML in practice , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[24]  A. Yamashita,et al.  Exploring the impact of inter-smell relations on software maintainability: An empirical study , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[25]  Paul Ralph,et al.  Grounded Theory in Software Engineering Research: A Critical Review and Guidelines , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[26]  S. Goodman Toward Evidence-Based Medical Statistics. 1: The P Value Fallacy , 1999, Annals of Internal Medicine.

[27]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[28]  Leif D. Nelson,et al.  False-Positive Psychology , 2011, Psychological science.

[29]  Lionel C. Briand,et al.  A Hitchhiker's guide to statistical tests for assessing randomized algorithms in software engineering , 2014, Softw. Test. Verification Reliab..

[30]  Lionel C. Briand,et al.  Random Testing: Theoretical Results and Practical Implications , 2012, IEEE Transactions on Software Engineering.

[31]  36th International Conference on Software Engineering, ICSE '14, Hyderabad, India - May 31 - June 07, 2014 , 2014, ICSE.

[32]  Bertrand Meyer,et al.  Agile vs. structured distributed software development: A case study , 2012, 2012 IEEE Seventh International Conference on Global Software Engineering.

[33]  David S. Rosenblum,et al.  Perturbation analysis of stochastic systems with empirical distribution parameters , 2014, ICSE.

[34]  Andrew Begel,et al.  Pair programming: what's in it for me? , 2008, ESEM '08.

[35]  Alessandro F. Garcia,et al.  Trading robustness for maintainability: an empirical study of evolving c# programs , 2014, ICSE.

[36]  Jacob Cohen The earth is round (p < .05) , 1994 .

[37]  C. Guure,et al.  Bayesian Estimation of Two-Parameter Weibull Distribution Using Extension of Jeffreys' Prior Information with Three Loss Functions , 2012 .

[38]  Carlo A. Furia,et al.  A Comparative Study of Programming Languages in Rosetta Code , 2014, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[39]  Alessandro Orso,et al.  X-PERT: Accurate identification of cross-browser issues in web applications , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[40]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[41]  Rui Zhang,et al.  An Empirical Study of Practitioners' Perspectives on Green Software Engineering , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[42]  Sven Apel,et al.  Views on Internal and External Validity in Empirical Software Engineering , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[43]  Zhendong Su,et al.  An Empirical Study on Real Bug Fixes , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[44]  Stefan Hanenberg,et al.  An Empirical Study on the Impact of C++ Lambdas and Programmer Experience , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[45]  Premkumar T. Devanbu,et al.  Belief & Evidence in Empirical Software Engineering , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[46]  Ayse Basar Bener,et al.  If it is software engineering, it is (probably) a Bayesian factor , 2016, Perspectives on Data Science for Software Engineering.

[47]  Adam Wojciechowski,et al.  Comparison of CMM Level 2 and eXtreme Programming , 2002, ECSQ.

[48]  37th IEEE/ACM International Conference on Software Engineering, ICSE 2015, Florence, Italy, May 16-24, 2015, Volume 1 , 2015, ICSE.

[49]  Bertrand Meyer,et al.  What good are strong specifications? , 2012, 2013 35th International Conference on Software Engineering (ICSE).

[50]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[51]  Lionel C. Briand,et al.  A practical guide for using statistical tests to assess randomized algorithms in software engineering , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[52]  J. Ioannidis Why Most Published Research Findings Are False , 2005, PLoS medicine.

[53]  D. Ross Jeffery,et al.  Empirical Study Towards a Leading Indicator for Cost of Formal Software Verification , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[54]  Liang Gong,et al.  Predicting bug-fixing time: An empirical study of commercial software projects , 2013, 2013 35th International Conference on Software Engineering (ICSE).