Variability and Reproducibility in Software Engineering: A Study of Four Companies that Developed the Same System

The scientific study of a phenomenon requires it to be reproducible. Mature engineering industries are recognized by projects and products that are, to some extent, reproducible. Yet, reproducibility in software engineering (SE) has not been investigated thoroughly, despite the fact that lack of reproducibility has both practical and scientific consequences. We report a longitudinal multiple-case study of variations and reproducibility in software development, from bidding to deployment, on the basis of the same requirement specification. In a call for tender to 81 companies, 35 responded. Four of them developed the system independently. The firm price, planned schedule, and planned development process, had, respectively, ldquolow,rdquo ldquolow,rdquo and ldquomediumrdquo reproducibilities. The contractor's costs, actual lead time, and schedule overrun of the projects had, respectively, ldquomedium,rdquo ldquohigh,rdquo and ldquolowrdquo reproducibilities. The quality dimensions of the delivered products, reliability, usability, and maintainability had, respectively, ldquolow,rdquo "high,rdquo and ldquolowrdquo reproducibilities. Moreover, variability for predictable reasons is also included in the notion of reproducibility. We found that the observed outcome of the four development projects matched our expectations, which were formulated partially on the basis of SE folklore. Nevertheless, achieving more reproducibility in SE remains a great challenge for SE research, education, and industry.

[1]  Mark C. Paulk,et al.  The Capability Maturity Model , 1991 .

[2]  Nancy G. Leveson,et al.  Analysis of Faults in an N-Version Software Experiment , 1990, IEEE Trans. Software Eng..

[3]  Nina Elisabeth Holt A Systematic Review of Case Studies in Software Engineering , 2006 .

[4]  David Gefen,et al.  What do software practitioners really think about project success: an exploratory study , 2005, J. Syst. Softw..

[5]  G. B. Moore,et al.  Case Studies and Organizational Innovation , 1985 .

[6]  Alistair Cockburn,et al.  Selecting a Project 's Methodology , 2000, IEEE Softw..

[7]  Inderpal S. Bhandari,et al.  Orthogonal Defect Classification - A Concept for In-Process Measurements , 1992, IEEE Trans. Software Eng..

[8]  Audris Mockus,et al.  Making the software factory work: lessons from a decade of experience , 2001, Proceedings Seventh International Software Metrics Symposium.

[9]  Ruven E. Brooks,et al.  Studying programmer behavior experimentally: the problems of proper methodology , 1980, CACM.

[10]  Barry W. Boehm,et al.  Software Engineering Economics , 1993, IEEE Transactions on Software Engineering.

[11]  Paul W. Oman,et al.  Construction and testing of polynomials predicting software maintainability , 1994, J. Syst. Softw..

[12]  Lutz Prechelt,et al.  The 28:1 Grant/Sackman legend is misleading, or: How large is interpersonal variation really? , 1999 .

[13]  Charles C. Ragin,et al.  Case-Oriented Research , 2001 .

[14]  Amela Karahasanovic,et al.  A survey of controlled experiments in software engineering , 2005, IEEE Transactions on Software Engineering.

[15]  Lutz Prechelt,et al.  Plat_Forms 2007: The Web Development Platform Comparison — Evaluation and Results , 2007 .

[16]  Sandra Slaughter,et al.  Quality Improvement and Infrastructure Activity Costs in Software Development: A Longitudinal Analysis , 2003, Manag. Sci..

[17]  D. Ross Jeffery,et al.  An Empirical Study of Analogy-based Software Effort Estimation , 1999, Empirical Software Engineering.

[18]  Gary D. Scudder,et al.  Improving Speed and Productivity of Software Development: A Global Survey of Software Developers , 1996, IEEE Trans. Software Eng..

[19]  Algirdas A. Avi The Methodology of N-Version Programming , 1995 .

[20]  L. Cronbach,et al.  Designing evaluations of educational and social programs , 1983 .

[21]  Khaled El Emam,et al.  SPICE in retrospect: Developing a standard for process assessment , 2007, J. Syst. Softw..

[22]  B. Cohen,et al.  Developing sociological knowledge: Theory and method , 1981 .

[23]  H F Dingman,et al.  SCIENTIFIC METHOD AND REPRODUCIBILITY OF RESULTS. , 1969, Multivariate behavioral research.

[24]  Soumitra Dutta,et al.  Software Development Productivity of European Space, Military, and Industrial Applications , 1996, IEEE Trans. Software Eng..

[25]  A. Handler BASIC , 1964 .

[26]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[27]  Tore Dybå,et al.  The Future of Empirical Methods in Software Engineering Research , 2007, Future of Software Engineering (FOSE '07).

[28]  George F. Reed,et al.  Use of Coefficient of Variation in Assessing Variability of Quantitative Assays , 2002, Clinical and Vaccine Immunology.

[29]  Tom DeMarco,et al.  Peopleware: Productive Projects and Teams , 1987 .

[30]  Bradford K. Clark Quantifying the effects of process improvement on effort , 2000 .

[31]  R. Yin Case Study Research: Design and Methods , 1984 .

[32]  Keith Phalp,et al.  Replicating the CREWS Use Case Authoring Guidelines Experiment , 2000, Empirical Software Engineering.

[33]  Tore Dybå,et al.  A Systematic Review of Theory Use in Software Engineering Experiments , 2007, IEEE Transactions on Software Engineering.

[34]  Ken Kelley,et al.  Sample size planning for the coefficient of variation from the accuracy in parameter estimation approach , 2007, Behavior research methods.

[35]  Shari Lawrence Pfleeger,et al.  Software Metrics : A Rigorous and Practical Approach , 1998 .

[36]  Asbjørn Følstad,et al.  Usability evaluation of four functional identical versions of DES (Database of empirical studies) , 2006 .

[37]  Justus D. Naumann,et al.  Empirical investigation of systems development practices and results , 1984, Inf. Manag..

[38]  Bill Curtis,et al.  By the way, did anyone study any real programmers? , 1986 .

[39]  Magne Jørgensen,et al.  Impact of experience on maintenance skills , 2002, J. Softw. Maintenance Res. Pract..

[40]  Bente Anda,et al.  Assessing Software System Maintainability using Structural Measures and Expert Assessments , 2007, 2007 IEEE International Conference on Software Maintenance.

[41]  Lutz Prechelt,et al.  An empirical comparison of C, C++, Java, Perl, Python, Rexx, and Tcl for a search/string-processing program , 2000 .

[42]  Lutz Prechelt,et al.  An Empirical Comparison of Seven Programming Languages , 2000, Computer.

[43]  C. Kemerer,et al.  OO Metrics in Practice , 2005, IEEE Softw..

[44]  Dag I. K. Sjøberg,et al.  Evaluating the effect of a delegated versus centralized control style on the maintainability of object-oriented software , 2004, IEEE Transactions on Software Engineering.

[45]  W. Shadish,et al.  Experimental and Quasi-Experimental Designs for Generalized Causal Inference , 2001 .

[46]  Audris Mockus,et al.  Formulation and preliminary test of an empirical theory of coordination in software engineering , 2003, ESEC/FSE-11.

[47]  Steve McConnell I Know What I Know , 2002, IEEE Softw..

[48]  Magne Jørgensen,et al.  A comparison of software project overruns - flexible versus sequential development models , 2005, IEEE Transactions on Software Engineering.

[49]  A. R. Ilersic,et al.  Research methods in social relations , 1961 .

[50]  Ralf H. Reussner,et al.  Reliability prediction for component-based software architectures , 2003, J. Syst. Softw..

[51]  James D. Herbsleb,et al.  Software quality and the Capability Maturity Model , 1997, CACM.

[52]  J. Hunter The national system of scientific measurement. , 1980, Science.

[53]  Barbara A. Kitchenham,et al.  Effort estimation using analogy , 1996, Proceedings of IEEE 18th International Conference on Software Engineering.

[54]  S. McConnell The Business of Software Improvement , 2002, IEEE Softw..

[55]  Mayuram S. Krishnan,et al.  Effects of Process Maturity on Quality, Cycle Time, and Effort in Software Product Development , 2000 .

[56]  Lyn Richards,et al.  Handling Qualitative Data: A Practical Guide , 2020 .

[57]  Manish Agrawal,et al.  Software Effort, Quality, and Cycle Time: A Study of CMM Level 5 Projects , 2007, IEEE Transactions on Software Engineering.

[58]  Andrea De Lucia,et al.  Assessing the maintenance processes of a software organization: an empirical analysis of a large industrial project , 2003, J. Syst. Softw..

[59]  Eliot Marshall,et al.  Getting the Noise Out of Gene Arrays , 2004, Science.

[60]  Ken Schwaber,et al.  Agile Project Management with Scrum , 1980 .

[61]  D. Altman,et al.  STATISTICAL METHODS FOR ASSESSING AGREEMENT BETWEEN TWO METHODS OF CLINICAL MEASUREMENT , 1986, The Lancet.

[62]  M. S. Krishnan,et al.  An Empirical Analysis of Productivity and Quality in Software Products , 2000 .

[63]  Magne Jørgensen,et al.  An empirical study of software project bidding , 2004, IEEE Transactions on Software Engineering.

[64]  A SlaughterSandra,et al.  Quality Improvement and Infrastructure Activity Costs in Software Development , 2003 .

[65]  Craig Larman,et al.  Applying UML and Patterns: An Introduction to Object-Oriented Analysis and Design and the Unified Process , 2001 .

[66]  Forrest Shull,et al.  Are Two Heads Better than One? On the Effectiveness of Pair Programming , 2007, IEEE Software.

[67]  L. Hedges,et al.  Statistical Methods for Meta-Analysis , 1987 .

[68]  Vijay K. Vaishnavi,et al.  Predicting Maintenance Performance Using Object-Oriented Design Complexity Metrics , 2003, IEEE Trans. Software Eng..

[69]  Lionel C. Briand,et al.  A Unified Framework for Coupling Measurement in Object-Oriented Systems , 1999, IEEE Trans. Software Eng..

[70]  Nancy G. Leveson,et al.  An experimental evaluation of the assumption of independence in multiversion programming , 1986, IEEE Transactions on Software Engineering.

[71]  Harald Atmanspacher,et al.  Problems of Reproducibility in Complex Mind-Matter Systems , 2003 .

[72]  Mark C. Paulk,et al.  Capability Maturity Model for Software, Version 1.1 , 1993 .

[73]  W. Trochim Outcome pattern matching and program theory , 1989 .

[74]  Tore Dybå,et al.  A systematic review of effect size in software engineering experiments , 2007, Inf. Softw. Technol..

[75]  Bente Anda,et al.  Assessing Software Product Maintainability Based on Class-Level Structural Measures , 2006, PROFES.

[76]  Barry W. Boehm,et al.  Cost models for future software life cycle processes: COCOMO 2.0 , 1995, Ann. Softw. Eng..

[77]  Dag I. K. Sjøberg,et al.  Towards a framework for empirical assessment of changeability decay , 2000, J. Syst. Softw..

[78]  Douglas R. Moodie,et al.  Pricing and lead time decisions for make-to-order firms with contingent orders , 1999, Eur. J. Oper. Res..

[79]  Magne Jørgensen The effects of the format of software project bidding processes , 2006 .

[80]  B. Curtis,et al.  Substantiating programmer variability , 1981, Proceedings of the IEEE.

[81]  Albert L. Lederer,et al.  Software management and cost estimating error , 2000, J. Syst. Softw..

[82]  K. Milis,et al.  Success factors regarding the implementation of ICT investment projects , 2002 .

[83]  R. Rosenthal,et al.  Meta-analysis: recent developments in quantitative methods for literature reviews. , 2001, Annual review of psychology.