Systematic Review Study of Decision Trees based Software Development Effort Estimation

The role of decision trees in software development effort estimation (SDEE) has received increased attention across several disciplines in recent years thanks to their power of predicting, their ease of use, and understanding. Furthermore, there are a large number of published studies that investigated the use of a decision tree (DT) techniques in SDEE. Nevertheless, in reviewing the literature, a systematic literature review (SLR) that assesses the evidence stated on DT techniques is still lacking. The main issues addressed in this paper have been divided into five parts: prediction accuracy, performance comparison, suitable conditions of prediction, the effect of the methods employed in association with DT techniques, and DT tools. To carry out this SLR, we performed an automatic search over five digital libraries for studies published between 1985 and 2019. In general, the results of this SLR revealed that most DT methods outperform many techniques and show an improvement in accuracy when combined with association rules (AR), fuzzy logic (FL), and bagging. Additionally, it has been observed a limited use of DT tools: it is therefore suggested for researchers to develop more DT tools to promote the industrial utilization of DT amongst professionals.

[1]  Martin J. Shepperd,et al.  Estimating Software Project Effort Using Analogies , 1997, IEEE Trans. Software Eng..

[2]  Andreas S. Andreou,et al.  Software Cost Estimation using Fuzzy Decision Trees , 2008, 2008 23rd IEEE/ACM International Conference on Automated Software Engineering.

[3]  Magne Jørgensen,et al.  A Systematic Review of Software Development Cost Estimation Studies , 2007, IEEE Transactions on Software Engineering.

[4]  Lionel C. Briand,et al.  A replicated assessment and comparison of common software cost modeling techniques , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[5]  Emilia Mendes,et al.  Web Effort Estimation , 2006, Web Engineering.

[6]  Emilia Mendes,et al.  A comparison of development effort estimation techniques for Web hypermedia applications , 2002, Proceedings Eighth IEEE Symposium on Software Metrics.

[7]  Emilia Mendes,et al.  A Comparative Study of Cost Estimation Models for Web Hypermedia Applications , 2003, Empirical Software Engineering.

[8]  Danny Ho,et al.  A comparison between decision trees and decision tree forest models for software development effort estimation , 2013, 2013 Third International Conference on Communications and Information Technology (ICCIT).

[9]  Mohammad Azzeh,et al.  Software effort estimation based on optimized model tree , 2011, Promise '11.

[10]  Xin Yao,et al.  A principled evaluation of ensembles of learning machines for software effort estimation , 2011, Promise '11.

[11]  Isabella Wieczorek,et al.  Resource Estimation in Software Engineering , 2002 .

[12]  Danny Ho,et al.  A Treeboost Model for Software Effort Estimation Based on Use Case Points , 2012, 2012 11th International Conference on Machine Learning and Applications.

[13]  D. Ross Jeffery,et al.  Using public domain metrics to estimate software development effort , 2001, Proceedings Seventh International Software Metrics Symposium.

[14]  Lionel C. Briand,et al.  Modeling Development Effort in Object-Oriented Systems Using Design Properties , 2001, IEEE Trans. Software Eng..

[15]  Ioannis Stamelos,et al.  Software Cost Prediction with Predefined Interval Estimates , 2004 .

[16]  Silvio Romero de Lemos Meira,et al.  Bagging Predictors for Estimation of Software Project Effort , 2007, 2007 International Joint Conference on Neural Networks.

[17]  Ioannis Stamelos,et al.  Combining probabilistic models for explanatory productivity estimation , 2008, Inf. Softw. Technol..

[18]  Mahmoud O. Elish Improved estimation of software project effort using multiple additive regression trees , 2009, Expert Syst. Appl..

[19]  Silvio Romero de Lemos Meira,et al.  Software Effort Estimation using Machine Learning Techniques with Robust Confidence Intervals , 2007, 7th International Conference on Hybrid Intelligent Systems (HIS 2007).

[20]  Sousuke Amasaki,et al.  The Effect of Moving Windows on Software Effort Estimation: Comparative Study with CART , 2014, 2014 6th International Workshop on Empirical Software Engineering in Practice.

[21]  Ioannis Stamelos,et al.  Selecting the Appropriate Machine Learning Techniques for the Prediction of Software Development Costs , 2006, AIAI.

[22]  Emilia Mendes,et al.  Cost Estimation Techniques for Web Projects , 2007 .

[23]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[24]  Barbara A. Kitchenham,et al.  A Simulation Study of the Model Evaluation Criterion MMRE , 2003, IEEE Trans. Software Eng..

[25]  B. Baskeles,et al.  Software effort estimation using machine learning methods , 2007, 2007 22nd international symposium on computer and information sciences.

[26]  Ingunn Myrtveit,et al.  Do arbitrary function approximators make sense as software prediction models? , 2004, 12 International Workshop on Software Technology and Engineering Practice (STEP'04).

[27]  Santanu Kumar Rath,et al.  Class point approach for software effort estimation using stochastic gradient boosting technique , 2014, SOEN.

[28]  Santanu Kumar Rath,et al.  Empirical assessment of machine learning models for agile software development effort estimation using story points , 2017, Innovations in Systems and Software Engineering.

[29]  Xin Yao,et al.  The impact of parameter tuning on software effort estimation using learning machines , 2013, PROMISE.

[30]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[31]  Ali Idri,et al.  A Fuzzy Decision Tree to Estimate Development Effort for Web Applications , 2011 .

[32]  Mahmoud O. Elish Assessment of voting ensemble for estimating software development effort , 2013, 2013 IEEE Symposium on Computational Intelligence and Data Mining (CIDM).

[33]  Andreas S. Andreou,et al.  A Hybrid Software Cost Estimation Approach Utilizing Decision Trees and Fuzzy Logic , 2012, Int. J. Softw. Eng. Knowl. Eng..

[34]  Lionel C. Briand,et al.  An assessment and comparison of common software cost estimation modeling techniques , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[35]  Fred P. Brooks,et al.  The Mythical Man-Month , 1975, Reliable Software.

[36]  Bart Baesens,et al.  Data Mining Techniques for Software Effort Estimation: A Comparative Study , 2012, IEEE Transactions on Software Engineering.

[37]  Ekrem Kocaguneli,et al.  Combining Multiple Learners Induced on Multiple Datasets for Software Effort Prediction , 2009 .

[38]  Andreas S. Andreou,et al.  Classification and Prediction of Software Cost through Fuzzy Decision Trees , 2009, ICEIS.

[39]  Santanu Kumar Rath,et al.  Fuzzy-class point approach for software effort estimation using various adaptive regression methods , 2013, CSI Transactions on ICT.

[40]  Mustafa Hammad,et al.  Machine Learning Models for Software Cost Estimation , 2019, 2019 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT).

[41]  Abdelaziz Marzak,et al.  Decision Trees Based Software Development Effort Estimation: A Systematic Mapping Study , 2019, 2019 International Conference of Computer Science and Renewable Energies (ICCSRE).

[42]  Sousuke Amasaki,et al.  Evaluation of Moving Window Policies with CART , 2016, 2016 7th International Workshop on Empirical Software Engineering in Practice (IWESEP).

[43]  Pearl Brereton,et al.  Performing systematic literature reviews in software engineering , 2006, ICSE.

[44]  R.N. Charette,et al.  Why software fails [software failure] , 2005, IEEE Spectrum.

[45]  Emilia Mendes An Overview of Web Effort Estimation , 2010, Adv. Comput..

[46]  Xin Yao,et al.  journal homepage: www.elsevier.com/locate/infsof Ensembles and locality: Insight on improving software effort estimation , 2022 .

[47]  Ali Idri,et al.  Investigating Effort Prediction of Software Projects on the ISBSG Dataset , 2012, ArXiv.

[48]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[49]  P. Kidwell,et al.  The mythical man-month: Essays on software engineering , 1996, IEEE Annals of the History of Computing.

[50]  Genny Tortora,et al.  Effort estimation modeling techniques: a case study for web applications , 2006, ICWE '06.

[51]  Emilia Mendes Introduction to Effort Estimation , 2014 .

[52]  Douglas Fisher,et al.  Machine Learning Approaches to Estimating Software Development Effort , 1995, IEEE Trans. Software Eng..

[53]  Jacky W. Keung Theoretical Maximum Prediction Accuracy for Analogy-Based Software Cost Estimation , 2008, 2008 15th Asia-Pacific Software Engineering Conference.

[54]  Yong Hu,et al.  Systematic literature review of machine learning based software development effort estimation models , 2012, Inf. Softw. Technol..

[55]  Emilia Mendes Web Cost Estimation and Productivity Benchmarking , 2008, ISSSE.

[56]  Zakrani abdelali,et al.  An Ensemble of Optimal Trees for Software Development Effort Estimation , 2018, AIT2S 2018.

[57]  Ali Idri,et al.  Applying Fuzzy ID3 Decision Tree for Software Effort Estimation , 2011, ArXiv.

[58]  A. Idri,et al.  Fuzzy model for an early estimation of software development effort , 2022 .

[59]  Santanu Kumar Rath,et al.  Empirical Assessment of Machine Learning Models for Effort Estimation of Web-based Applications , 2017, ISEC.

[60]  Emilia Mendes A Comparison of Techniques for Web Effort Estimation , 2007, First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007).

[61]  Alain Abran,et al.  Systematic literature review of ensemble effort estimation , 2016, J. Syst. Softw..