Systematic literature review of machine learning based software development effort estimation models

Context: Software development effort estimation (SDEE) is the process of predicting the effort required to develop a software system. In order to improve estimation accuracy, many researchers have proposed machine learning (ML) based SDEE models (ML models) since 1990s. However, there has been no attempt to analyze the empirical evidence on ML models in a systematic way. Objective: This research aims to systematically analyze ML models from four aspects: type of ML technique, estimation accuracy, model comparison, and estimation context. Method: We performed a systematic literature review of empirical studies on ML model published in the last two decades (1991-2010). Results: We have identified 84 primary studies relevant to the objective of this research. After investigating these studies, we found that eight types of ML techniques have been employed in SDEE models. Overall speaking, the estimation accuracy of these ML models is close to the acceptable level and is better than that of non-ML models. Furthermore, different ML models have different strengths and weaknesses and thus favor different estimation contexts. Conclusion: ML models are promising in the field of SDEE. However, the application of ML models in industry is still limited, so that more effort and incentives are needed to facilitate the application of ML models. To this end, based on the findings of this review, we provide recommendations for researchers as well as guidelines for practitioners.

[1]  Thong Ngee Goh,et al.  A study of the non-linear adjustment for analogy based software cost estimation , 2009, Empirical Software Engineering.

[2]  Lefteris Angelis,et al.  LSEbA: least squares regression and estimation by analogy in a semi-parametric model for software cost estimation , 2010, Empirical Software Engineering.

[3]  Colin J Burgess,et al.  Can genetic programming improve software effort estimation? A comparative evaluation , 2001, Inf. Softw. Technol..

[4]  Ioannis Stamelos,et al.  Combining probabilistic models for explanatory productivity estimation , 2008, Inf. Softw. Technol..

[5]  Sun-Jen Huang,et al.  Applying fuzzy neural network to estimate software development effort , 2009, Applied Intelligence.

[6]  Claes Wohlin,et al.  An analysis of the most cited articles in software engineering journals - 2002 , 2009, Inf. Softw. Technol..

[7]  Ali Selamat,et al.  Information and Software Technology , 2014 .

[8]  Mahmoud O. Elish Improved estimation of software project effort using multiple additive regression trees , 2009, Expert Syst. Appl..

[9]  Victor R. Basili,et al.  A meta-model for software development resource expenditures , 1981, ICSE '81.

[10]  Isabella Wieczorek,et al.  Resource Estimation in Software Engineering , 2002 .

[11]  Stephen G. MacDonell,et al.  Software Metrics Data Analysis—Exploring the Relative Performance of Some Commonly Used Modeling Techniques , 1999, Empirical Software Engineering.

[12]  Sun-Jen Huang,et al.  The adjusted analogy-based software effort estimation based on similarity distances , 2007, J. Syst. Softw..

[13]  Michael Y. Hu,et al.  Forecasting with artificial neural networks: The state of the art , 1997 .

[14]  Shixian Li,et al.  Improve Analogy-Based Software Effort Estimation Using Principal Components Analysis and Correlation Weighting , 2009, 2009 16th Asia-Pacific Software Engineering Conference.

[15]  D. Ross Jeffery,et al.  Cost Estimation : A Review of Models , Process , and Practice , 2010 .

[16]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[17]  Sun-Jen Huang,et al.  Optimization of analogy weights by genetic algorithm for software effort estimation , 2006, Inf. Softw. Technol..

[18]  Amrit L. Goel,et al.  Empirical Data Modeling in Software Engineering Using Radical Basis Functions , 2000, IEEE Trans. Software Eng..

[19]  Martin J. Shepperd,et al.  Estimating Software Project Effort Using Analogies , 1997, IEEE Trans. Software Eng..

[20]  Barry W. Boehm,et al.  Software Engineering Economics , 1993, IEEE Transactions on Software Engineering.

[21]  Thong Ngee Goh,et al.  A study of project selection and feature weighting for analogy based software cost estimation , 2009, J. Syst. Softw..

[22]  Ingunn Myrtveit,et al.  A Controlled Experiment to Assess the Benefits of Estimating with Analogy and Regression Models , 1999, IEEE Trans. Software Eng..

[23]  Doo-Hwan Bae,et al.  An empirical analysis of software effort estimation with outlier elimination , 2008, PROMISE '08.

[24]  D. Ross Jeffery,et al.  An Empirical Study of Analogy-based Software Effort Estimation , 1999, Empirical Software Engineering.

[25]  Barbara A. Kitchenham,et al.  Combining empirical results in software engineering , 1998, Inf. Softw. Technol..

[26]  D. Ross Jeffery,et al.  Analogy-X: Providing Statistical Inference to Analogy-Based Software Cost Estimation , 2008, IEEE Transactions on Software Engineering.

[27]  Ingunn Myrtveit,et al.  Reliability and validity in comparative studies of software prediction models , 2005, IEEE Transactions on Software Engineering.

[28]  Gavin R. Finnie,et al.  Estimating software development effort with connectionist models , 1997, Inf. Softw. Technol..

[29]  Bart Baesens,et al.  Software Effort Prediction Using Regression Rule Extraction from Neural Networks , 2010, 2010 22nd IEEE International Conference on Tools with Artificial Intelligence.

[30]  Sun-Jen Huang,et al.  Fuzzy Decision Tree Approach for Embedding Risk Assessment Information into Software Cost Estimation Model , 2006, J. Inf. Sci. Eng..

[31]  F. J. Heemstra,et al.  Software cost estimation , 1992, Inf. Softw. Technol..

[32]  D. Ross Jeffery,et al.  A comparative study of two software development cost modeling techniques using multi-organizational and company-specific data , 2000, Inf. Softw. Technol..

[33]  Keith Phalp,et al.  An investigation of machine learning based prediction systems , 2000, J. Syst. Softw..

[34]  Douglas Fisher,et al.  Machine Learning Approaches to Estimating Software Development Effort , 1995, IEEE Trans. Software Eng..

[35]  Silvio Romero de Lemos Meira,et al.  Software Effort Estimation Using Machine Learning Techniques with Robust Confidence Intervals , 2007, 19th IEEE International Conference on Tools with Artificial Intelligence(ICTAI 2007).

[36]  Kaushal K. Shukla,et al.  Neuro-genetic prediction of software development effort , 2000, Inf. Softw. Technol..

[37]  Lionel C. Briand,et al.  An assessment and comparison of common software cost estimation modeling techniques , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[38]  Jean-Marc Desharnais,et al.  A comparison of software effort estimation techniques: Using function points with neural networks, case-based reasoning and regression models , 1997, J. Syst. Softw..

[39]  Guilherme Horta Travassos,et al.  Cross versus Within-Company Cost Estimation Studies: A Systematic Review , 2007, IEEE Transactions on Software Engineering.

[40]  Isabella Wieczorek,et al.  How valuable is company-specific data compared to multi-company data for software cost estimation? , 2002, Proceedings Eighth IEEE Symposium on Software Metrics.

[41]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[42]  Erik Stensrud,et al.  Alternative approaches to effort prediction of ERP projects , 2001, Inf. Softw. Technol..

[43]  Gavin R. Finnie,et al.  Using Artificial Neural Networks and Function Points to Estimate 4GL Software Development Effort , 1994, Australas. J. Inf. Syst..

[44]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[45]  D. Ross Jeffery,et al.  Using public domain metrics to estimate software development effort , 2001, Proceedings Seventh International Software Metrics Symposium.

[46]  Filomena Ferrucci,et al.  Genetic Programming for Effort Estimation: An Analysis of the Impact of Different Fitness Functions , 2010, 2nd International Symposium on Search Based Software Engineering.

[47]  Martin J. Shepperd,et al.  Comparing Software Prediction Techniques Using Simulation , 2001, IEEE Trans. Software Eng..

[48]  Lionel C. Briand,et al.  Modeling Development Effort in Object-Oriented Systems Using Design Properties , 2001, IEEE Trans. Software Eng..

[49]  Parag C. Pendharkar,et al.  A probabilistic model for predicting software development effort , 2003, IEEE Transactions on Software Engineering.

[50]  Martin Shepperd,et al.  Experiences Using Case-Based Reasoning to Predict Software Project Effort , 2000 .

[51]  Ricardo Massa Ferreira Lima,et al.  GA-based method for feature selection and parameters optimization for machine learning regression applied to software effort estimation , 2010, Inf. Softw. Technol..

[52]  Stephen G. MacDonell,et al.  A comparison of techniques for developing predictive models of software metrics , 1997, Inf. Softw. Technol..

[53]  Lefteris Angelis,et al.  Improving analogy-based software cost estimation by a resampling method , 2008, Inf. Softw. Technol..

[54]  Barry W. Boehm,et al.  Software development cost estimation approaches — A survey , 2000, Ann. Softw. Eng..

[55]  Thomas J. Ostrand,et al.  \{PROMISE\} Repository of empirical software engineering data , 2007 .

[56]  Lawrence H. Putnam,et al.  A General Empirical Solution to the Macro Software Sizing and Estimating Problem , 1978, IEEE Transactions on Software Engineering.

[57]  Ioannis Stamelos,et al.  Selecting the Appropriate Machine Learning Techniques for the Prediction of Software Development Costs , 2006, AIAI.

[58]  Barbara Kitchenham,et al.  A comparison of cross-company and within-company effort estimation models for Web applications , 2004, ICSE 2004.

[59]  Ioannis Stamelos,et al.  On the use of Bayesian belief networks for the prediction of software productivity , 2003, Inf. Softw. Technol..

[60]  Vadlamani Ravi,et al.  Software development cost estimation using wavelet neural networks , 2008, J. Syst. Softw..

[61]  Abbas Heiat,et al.  Comparison of artificial neural network and regression models for estimating software development effort , 2002, Inf. Softw. Technol..

[62]  Silvio Romero de Lemos Meira,et al.  Bagging Predictors for Estimation of Software Project Effort , 2007, 2007 International Joint Conference on Neural Networks.

[63]  Peter I. Cowling,et al.  Software effort estimation based on weighted fuzzy grey relational analysis , 2009, PROMISE '09.

[64]  Michael M. Richter,et al.  A flexible method for software effort estimation by analogy , 2007, Empirical Software Engineering.

[65]  Jae Kyu Lee,et al.  Quasi-optimal case-selective neural network model for software effort estimation , 2001, Expert Syst. Appl..

[66]  José Demisio Simões da Silva,et al.  An investigation of artificial neural networks based prediction systems in software project management , 2008, J. Syst. Softw..

[67]  Rafael Capilla,et al.  Viability for codifying and documenting architectural design decisions with tool support , 2010 .

[68]  David Ellison,et al.  Software cost estimation using an Albus perceptron (CMAC) , 1997, Inf. Softw. Technol..

[69]  Günther Ruhe,et al.  Analysis of attribute weighting heuristics for analogy-based software effort estimation method AQUA+ , 2008, Empirical Software Engineering.

[70]  Magne Jørgensen,et al.  Software effort estimation terminology: The tower of Babel , 2006, Inf. Softw. Technol..

[71]  Kjetil Moløkken-Østvold,et al.  A survey on software estimation in the Norwegian industry , 2004, 10th International Symposium on Software Metrics, 2004. Proceedings..

[72]  J. Higgins,et al.  Cochrane Handbook for Systematic Reviews of Interventions, Version 5.1.0. The Cochrane Collaboration , 2013 .

[73]  Genny Tortora,et al.  Effort estimation modeling techniques: a case study for web applications , 2006, ICWE '06.

[74]  Magne Jørgensen,et al.  A review of studies on expert estimation of software development effort , 2004, J. Syst. Softw..

[75]  Barry W. Boehm,et al.  Software economics: status and prospects , 1999, Inf. Softw. Technol..

[76]  Lionel C. Briand,et al.  A replicated assessment and comparison of common software cost modeling techniques , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[77]  Magne Jørgensen,et al.  A Systematic Review of Software Development Cost Estimation Studies , 2007, IEEE Transactions on Software Engineering.

[78]  Victor R. Basili,et al.  A Pattern Recognition Approach for Software Engineering Data Analysis , 1992, IEEE Trans. Software Eng..

[79]  Amela Karahasanovic,et al.  A survey of controlled experiments in software engineering , 2005, IEEE Transactions on Software Engineering.

[80]  Ingunn Myrtveit,et al.  Human performance estimating with analogy and regression models: an empirical validation , 1998, Proceedings Fifth International Software Metrics Symposium. Metrics (Cat. No.98TB100262).

[81]  José Javier Dolado,et al.  On the problem of the software cost function , 2001, Inf. Softw. Technol..

[82]  Pearl Brereton,et al.  Lessons from applying the systematic literature review process within the software engineering domain , 2007, J. Syst. Softw..

[83]  Peter I. Cowling,et al.  Software Project Similarity Measurement Based on Fuzzy C-Means , 2008, ICSP.

[84]  John E. Gaffney,et al.  Software Function, Source Lines of Code, and Development Effort Prediction: A Software Science Validation , 1983, IEEE Transactions on Software Engineering.

[85]  Emilia Mendes,et al.  Bayesian Network Models for Web Effort Prediction: A Comparative Study , 2008, IEEE Transactions on Software Engineering.

[86]  Ayse Basar Bener,et al.  Ensemble of neural networks with associative memory (ENNA) for estimating software development costs , 2009, Knowl. Based Syst..

[87]  Pearl Brereton,et al.  Performing systematic literature reviews in software engineering , 2006, ICSE.

[88]  Chris F. Kemerer,et al.  An empirical validation of software cost estimation models , 1987, CACM.

[89]  Emilia Mendes,et al.  Further comparison of cross-company and within-company effort estimation models for Web applications , 2004 .

[90]  Emilia Mendes,et al.  Investigating the use of Support Vector Regression for web effort estimation , 2011, Empirical Software Engineering.

[91]  Li-Wei Chen,et al.  Integration of the grey relational analysis with genetic algorithm for software effort estimation , 2008, Eur. J. Oper. Res..

[92]  Heejun Park,et al.  An empirical validation of a neural network model for software effort estimation , 2008, Expert Syst. Appl..

[93]  Günther Ruhe,et al.  Impact Analysis of Missing Values on the Prediction Accuracy of Analogy-based Software Effort Estimation Method AQUA , 2007, First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007).

[94]  Tzvi Raz,et al.  Comparison of estimation methods of cost and duration in IT projects , 2009, Inf. Softw. Technol..

[95]  B. Stewart Predicting project delivery rates using the Naive-Bayes classifier , 2002, J. Softw. Maintenance Res. Pract..

[96]  Michael J. Prietula,et al.  Examining the Feasibility of a Case-Based Reasoning Model for Software Effort Estimation , 1992, MIS Q..

[97]  Ioannis Stamelos,et al.  BBN based approach for improving the software development process of an SME - a case study , 2010, J. Softw. Maintenance Res. Pract..

[98]  Thong Ngee Goh,et al.  A study of mutual information based feature selection for case based reasoning in software cost estimation , 2009, Expert Syst. Appl..

[99]  Emilia Mendes,et al.  Cross-company vs. single-company web effort models using the Tukutuku database: An extended study , 2008, J. Syst. Softw..

[100]  Stephen G. MacDonell,et al.  Combining techniques to optimize effort predictions in software project management , 2003, J. Syst. Softw..

[101]  Carolyn Mair,et al.  The consistency of empirical comparisons of regression and analogy-based software project cost prediction , 2005, 2005 International Symposium on Empirical Software Engineering, 2005..

[102]  Claes Wohlin,et al.  An analysis of the most cited articles in software engineering journals - 2000 , 2007, Inf. Softw. Technol..

[103]  Chun Hung Cheng,et al.  Software development cost estimation: Integrating neural network with cluster analysis , 1998, Inf. Manag..

[104]  G. Noblit,et al.  Meta-Ethnography: Synthesizing Qualitative Studies , 1988 .

[105]  Emilia Mendes A Comparison of Techniques for Web Effort Estimation , 2007, First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007).

[106]  Emilia Mendes,et al.  Investigating Web size metrics for early Web cost estimation , 2005, J. Syst. Softw..

[107]  Magne Jørgensen,et al.  Forecasting of software development work effort: Evidence on expert judgement and formal models , 2007 .

[108]  Emilia Mendes,et al.  How effective is Tabu search to configure support vector regression for effort estimation? , 2010, PROMISE '10.

[109]  Adriano Lorena Inácio de Oliveira,et al.  Estimation of software project effort with support vector regression , 2006, Neurocomputing.

[110]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[111]  Stefan Biffl,et al.  Optimal project feature weights in analogy-based cost estimation: improvement and limitations , 2006, IEEE Transactions on Software Engineering.

[112]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[113]  Magne Jørgensen,et al.  Software effort estimation by analogy and "regression toward the mean" , 2003, J. Syst. Softw..

[114]  Shari Lawrence Pfleeger,et al.  Soup or Art? The Role of Evidential Force in Empirical Software Engineering , 2005, IEEE Softw..

[115]  Pearl Brereton,et al.  Systematic literature reviews in software engineering - A tertiary study , 2010, Inf. Softw. Technol..

[116]  Emilia Mendes,et al.  Further investigation into the use of CBR and stepwise regression to predict development effort for Web hypermedia applications , 2002, Proceedings International Symposium on Empirical Software Engineering.

[117]  Yeong-Seok Seo,et al.  Filtering of Inconsistent Software Project Data for Analogy-Based Effort Estimation , 2010, 2010 IEEE 34th Annual Computer Software and Applications Conference.

[118]  Emilia Mendes,et al.  A Comparative Study of Cost Estimation Models for Web Hypermedia Applications , 2003, Empirical Software Engineering.

[119]  Martin J. Shepperd,et al.  Software project economics: a roadmap , 2007, Future of Software Engineering (FOSE '07).

[120]  Taghi M. Khoshgoftaar,et al.  Fuzzy case-based reasoning models for software cost estimation , 2004 .

[121]  Ali Idri,et al.  Software Cost Estimation Models Using Radial Basis Function Neural Networks , 2007, IWSM/Mensura.

[122]  Stephen G. MacDonell,et al.  Comparing Local and Global Software Effort Estimation Models -- Reflections on a Systematic Review , 2007, First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007).

[123]  Daniel Neagu,et al.  Improving analogy software effort estimation using fuzzy feature subset selection algorithm , 2008, PROMISE '08.

[124]  Tore Dybå,et al.  Empirical studies of agile software development: A systematic review , 2008, Inf. Softw. Technol..

[125]  Jeffrey J. P. Tsai,et al.  Machine Learning and Software Engineering , 2002, 14th IEEE International Conference on Tools with Artificial Intelligence, 2002. (ICTAI 2002). Proceedings..

[126]  Danny Ho,et al.  Improving the COCOMO model using a neuro-fuzzy approach , 2007, Appl. Soft Comput..

[127]  Sérgio Soares,et al.  Hybrid Intelligent Design of Morphological-Rank-Linear Perceptrons for Software Development Cost Estimation , 2010, 2010 22nd IEEE International Conference on Tools with Artificial Intelligence.

[128]  Ioannis Stamelos,et al.  A Simulation Tool for Efficient Analogy Based Cost Estimation , 2000, Empirical Software Engineering.

[129]  Anette C. Lien,et al.  A survey on software estimation in the Norwegian industry , 2004 .