The consistency of empirical comparisons of regression and analogy-based software project cost prediction

The objective is to determine the consistency within and between results in empirical studies of software engineering cost estimation. We focus on regression and analogy techniques as these are commonly used. We conducted an exhaustive literature search using predefined inclusion and exclusion criteria and identified 67 journal papers and 104 conference papers. From this sample we identified 11 journal papers and 9 conference papers that used both methods. Our analysis found that about 25% of studies were internally inconclusive. We also found that there is approximately equal evidence in favour of, and against analogy-based methods. We confirm the lack of consistency in the findings and argue that this inconsistent pattern from 20 different studies comparing regression and analogy is somewhat disturbing. It suggests that we need to ask more detailed questions than just: "What is the best prediction system?".

[1]  Hans van Vliet,et al.  Predicting maintenance effort with function points , 1997, 1997 Proceedings International Conference on Software Maintenance.

[2]  Stephen G. MacDonell,et al.  A comparison of techniques for developing predictive models of software metrics , 1997, Inf. Softw. Technol..

[3]  Emilia Mendes,et al.  Further comparison of cross-company and within-company effort estimation models for Web applications , 2004 .

[4]  Emilia Mendes,et al.  Early Web size measures and effort prediction for Web costimation , 2003, Proceedings. 5th International Workshop on Enterprise Networking and Computing in Healthcare Industry (IEEE Cat. No.03EX717).

[5]  D. Ross Jeffery,et al.  Using public domain metrics to estimate software development effort , 2001, Proceedings Seventh International Software Metrics Symposium.

[6]  Martin J. Shepperd,et al.  Comparing Software Prediction Techniques Using Simulation , 2001, IEEE Trans. Software Eng..

[7]  Michael J. Prietula,et al.  Software-effort estimation with a case-based reasoner , 1996, J. Exp. Theor. Artif. Intell..

[8]  Michael J. Prietula,et al.  Examining the Feasibility of a Case-Based Reasoning Model for Software Effort Estimation , 1992, MIS Q..

[9]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[10]  James Miller,et al.  Applying meta-analytical procedures to software engineering experiments , 2000, J. Syst. Softw..

[11]  Martin J. Shepperd,et al.  Using simulation to evaluate prediction techniques [for software] , 2001, Proceedings Seventh International Software Metrics Symposium.

[12]  Stephen G. MacDonell,et al.  Combining techniques to optimize effort predictions in software project management , 2003, J. Syst. Softw..

[13]  Martin Shepperd,et al.  Using Simulation to Evaluate Prediction Techniques , 2001 .

[14]  Jean-Marc Desharnais,et al.  Estimating Software Development Effort with Case-Based Reasoning , 1997, ICCBR.

[15]  Forrest Shull,et al.  Building Knowledge through Families of Experiments , 1999, IEEE Trans. Software Eng..

[16]  G. Glass Primary, Secondary, and Meta-Analysis of Research1 , 1976 .

[17]  Robert T. Hughes,et al.  Evaluating software development effort model-building techniques for application in a real-time telecommunications environment , 1998, IEE Proc. Softw..

[18]  Sunder Kekre,et al.  Software Effort Models for Early Estimation of Process Control Applications , 1992, IEEE Trans. Software Eng..

[19]  R. Rosenthal The file drawer problem and tolerance for null results , 1979 .

[20]  Martin J. Shepperd,et al.  Estimating Software Project Effort Using Analogies , 1997, IEEE Trans. Software Eng..

[21]  Ingunn Myrtveit,et al.  A Controlled Experiment to Assess the Benefits of Estimating with Analogy and Regression Models , 1999, IEEE Trans. Software Eng..

[22]  Emilia Mendes,et al.  A comparison of case-based reasoning approaches , 2002, WWW '02.

[23]  D. Ross Jeffery,et al.  An Empirical Study of Analogy-based Software Effort Estimation , 1999, Empirical Software Engineering.

[24]  Magne Jørgensen,et al.  An analysis of data sets used to train and validate cost prediction systems , 2005, PROMISE '05.

[25]  Lionel C. Briand,et al.  An assessment and comparison of common software cost estimation modeling techniques , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[26]  Emilia Mendes,et al.  Further investigation into the use of CBR and stepwise regression to predict development effort for Web hypermedia applications , 2002, Proceedings International Symposium on Empirical Software Engineering.

[27]  Jean-Marc Desharnais,et al.  A comparison of software effort estimation techniques: Using function points with neural networks, case-based reasoning and regression models , 1997, J. Syst. Softw..

[28]  Derek Coleman,et al.  Introducing Objectcharts or how to use Statecharts in object-oriented design , 1992 .

[29]  J. Ord,et al.  Principles of forecasting: A handbook for researchers and practitioners , 2002 .

[30]  Keith Phalp,et al.  An investigation of machine learning based prediction systems , 2000, J. Syst. Softw..

[31]  Will Hayes,et al.  Research synthesis in software engineering: a case for meta-analysis , 1999, Proceedings Sixth International Software Metrics Symposium (Cat. No.PR00403).

[32]  Emilia Mendes,et al.  A Comparative Study of Cost Estimation Models for Web Hypermedia Applications , 2003, Empirical Software Engineering.

[33]  Shari Lawrence Pfleeger,et al.  Software metrics (2nd ed.): a rigorous and practical approach , 1997 .

[34]  J. Scott Armstrong,et al.  Principles of forecasting , 2001 .

[35]  Emilia Mendes,et al.  Using an engineering approach to understanding and predicting Web authoring and design , 2001, Proceedings of the 34th Annual Hawaii International Conference on System Sciences.

[36]  Colin J Burgess,et al.  Can genetic programming improve software effort estimation? A comparative evaluation , 2001, Inf. Softw. Technol..

[37]  Ioannis Stamelos,et al.  A Simulation Tool for Efficient Analogy Based Cost Estimation , 2000, Empirical Software Engineering.

[38]  Shari Lawrence Pfleeger,et al.  Software Metrics : A Rigorous and Practical Approach , 1998 .

[39]  Lionel C. Briand,et al.  A replicated assessment and comparison of common software cost modeling techniques , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[40]  D. Ross Jeffery,et al.  A comparative study of two software development cost modeling techniques using multi-organizational and company-specific data , 2000, Inf. Softw. Technol..

[41]  Isabella Wieczorek,et al.  How valuable is company-specific data compared to multi-company data for software cost estimation? , 2002, Proceedings Eighth IEEE Symposium on Software Metrics.

[42]  James Miller Can results from software engineering experiments be safely combined? , 1999, Proceedings Sixth International Software Metrics Symposium (Cat. No.PR00403).

[43]  Barbara A. Kitchenham,et al.  An investigation of analysis techniques for software datasets , 1999, Proceedings Sixth International Software Metrics Symposium (Cat. No.PR00403).