On an optimal analogy-based software effort estimation

Abstract Context: An analogy-based software effort estimation technique estimates the required effort for a new software project based on the total effort used in completing past similar projects. In practice, offering high accuracy can be difficult for the technique when the new software project is not similar to any completed projects. In this case, the accuracy will rely heavily on a process called effort adaptation, where the level of difference between the new project and its most similar past projects is quantified and transformed to the difference in the effort. In the past, attempts to adapt to the effort used machine learning algorithms; however, no algorithm was able to offer a significantly higher performance. On the contrary, only a simple heuristic such as scaling the effort by consulting the difference in software size was adopted. Objective:More recently, million-dollar prize data-science competitions have fostered the rapid development of more powerful machine learning algorithms, such as the Gradient boosting machine and Deep learning algorithm. Therefore, this study revisits the comparison of software effort adaptors that are based on heuristics and machine learning algorithms. Method:A systematic comparison of software effort estimators, which they all were fully optimized by Bayesian optimization technique, was carried out on 13 standard benchmark datasets. The comparison was supported by robust performance metrics and robust statistical test methods. Conclusion:The results suggest a novel strategy to construct a more accurate analogy-based estimator by adopting a combined effort adaptor. In particular, the analogy-based model that adapts to the effort by integrating the Gradient boosting machine algorithm and a traditional adaptation technique based on productivity adjustment has performed the best in the study. Particularly, this model significantly outperformed various state-of-the-art effort estimation techniques, including a current standard benchmark algorithmic-based technique, analogy-based techniques, and machine learning-based techniques.

[1]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[2]  Akito Monden,et al.  MAHAKIL: Diversity Based Oversampling Approach to Alleviate the Class Imbalance Issue in Software Defect Prediction , 2018, IEEE Transactions on Software Engineering.

[3]  Masashi Sugiyama,et al.  Winning the Kaggle Algorithmic Trading Challenge with the Composition of Many Models and Feature Engineering , 2013, IEICE Trans. Inf. Syst..

[4]  Tim Menzies,et al.  Finding conclusion stability for selecting the best effort predictor in software effort estimation , 2012, Automated Software Engineering.

[5]  Mark Harman,et al.  Exact Mean Absolute Error of Baseline Predictor, MARP0 , 2016, Inf. Softw. Technol..

[6]  Tim Menzies,et al.  On the Value of Ensemble Effort Estimation , 2012, IEEE Transactions on Software Engineering.

[7]  Jöran Beel,et al.  Machine Learning vs. Rules and Out-of-the-Box vs. Retrained: An Evaluation of Open-Source Bibliographic Reference and Citation Parsers , 2018, JCDL.

[8]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[9]  Emilia Mendes,et al.  A replicated assessment of the use of adaptation rules to improve Web cost estimation , 2003, 2003 International Symposium on Empirical Software Engineering, 2003. ISESE 2003. Proceedings..

[10]  M. Puri,et al.  The multivariate nonparametric Behrens–Fisher problem , 2002 .

[11]  Barbara A. Kitchenham,et al.  A Simulation Study of the Model Evaluation Criterion MMRE , 2003, IEEE Trans. Software Eng..

[12]  Magne Jørgensen,et al.  A Systematic Review of Software Development Cost Estimation Studies , 2007 .

[13]  Carolyn Mair,et al.  The consistency of empirical comparisons of regression and analogy-based software project cost prediction , 2005, 2005 International Symposium on Empirical Software Engineering, 2005..

[14]  Stephen G. MacDonell,et al.  Evaluating prediction systems in software project estimation , 2012, Inf. Softw. Technol..

[15]  Zhihao Chen,et al.  Validation methods for calibrating software effort models , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[16]  Magne Jørgensen,et al.  Software effort estimation by analogy and "regression toward the mean" , 2003, J. Syst. Softw..

[17]  Emilia Mendes,et al.  Using tabu search to configure support vector regression for effort estimation , 2013, Empirical Software Engineering.

[18]  Mark Harman,et al.  Multi-objective Software Effort Estimation , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[19]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[20]  Anil K. Jain,et al.  Artificial Neural Networks: A Tutorial , 1996, Computer.

[21]  Tim Menzies,et al.  oftware effort models should be assessed via leave-one-out validation , 2013 .

[22]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[23]  Stéphane M. Meystre,et al.  Classification of Contextual Use of Left Ventricular Ejection Fraction Assessments , 2015, MedInfo.

[24]  Jacky W. Keung,et al.  Software Development Cost Estimation Using Analogy: A Review , 2009, 2009 Australian Software Engineering Conference.

[25]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[26]  Thong Ngee Goh,et al.  A study of the non-linear adjustment for analogy based software cost estimation , 2009, Empirical Software Engineering.

[27]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[28]  Mohammad Azzeh Model Tree Based Adaption Strategy for Software Effort Estimation by Analogy , 2011, 2011 IEEE 11th International Conference on Computer and Information Technology.

[29]  Martin J. Shepperd,et al.  Comparing Software Prediction Techniques Using Simulation , 2001, IEEE Trans. Software Eng..

[30]  Anthony Goldbloom,et al.  Data Prediction Competitions -- Far More than Just a Bit of Fun , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[31]  Tim Menzies,et al.  Active learning and effort estimation: Finding the essential content of software effort estimation data , 2013, IEEE Transactions on Software Engineering.

[32]  Mantao Wang,et al.  Collaborative Representation Classifier Based on K Nearest Neighbors for Classification , 2015 .

[33]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[34]  Xin Yao,et al.  Diversity creation methods: a survey and categorisation , 2004, Inf. Fusion.

[35]  Chris F. Kemerer,et al.  An empirical validation of software cost estimation models , 1987, CACM.

[36]  Pearl Brereton,et al.  Robust Statistical Methods for Empirical Software Engineering , 2017, Empirical Software Engineering.

[37]  D. Ross Jeffery,et al.  An Empirical Study of Analogy-based Software Effort Estimation , 1999, Empirical Software Engineering.

[38]  Tim Menzies,et al.  Hyperparameter Optimization for Effort Estimation , 2018, ArXiv.

[39]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[40]  Emilia Mendes,et al.  An Empirical Analysis of Linear Adaptation Techniques for Case-Based Prediction , 2003, ICCBR.

[41]  Martin J. Shepperd,et al.  Estimating Software Project Effort Using Analogies , 1997, IEEE Trans. Software Eng..

[42]  Federica Sarro,et al.  Linear Programming as a Baseline for Software Effort Estimation , 2018, ACM Trans. Softw. Eng. Methodol..

[43]  David D. Cox,et al.  Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures , 2013, ICML.

[44]  B. Kitchenham,et al.  Inter-item correlations among function points , 1993, Proceedings of 1993 15th International Conference on Software Engineering.

[45]  D. Ross Jeffery,et al.  Analogy-X: Providing Statistical Inference to Analogy-Based Software Cost Estimation , 2008, IEEE Transactions on Software Engineering.

[46]  Michelle Cartwright,et al.  A replication of the use of regression towards the mean (R2M) as an adjustment to effort estimation models , 2005, 11th IEEE International Software Metrics Symposium (METRICS'05).

[47]  Xin Yao,et al.  journal homepage: www.elsevier.com/locate/infsof Ensembles and locality: Insight on improving software effort estimation , 2022 .

[48]  Akito Monden,et al.  A stability assessment of solution adaptation techniques for analogy-based software effort estimation , 2017, Empirical Software Engineering.

[49]  Ayse Basar Bener,et al.  A new perspective on data homogeneity in software cost estimation: a study in the embedded systems domain , 2010, Software Quality Journal.

[50]  John E. Gaffney,et al.  Software Function, Source Lines of Code, and Development Effort Prediction: A Software Science Validation , 1983, IEEE Transactions on Software Engineering.

[51]  Katrina D. Maxwell,et al.  Applied Statistics for Software Managers , 2002 .

[52]  Y. Miyazaki,et al.  Robust regression for developing software estimation models , 1994, J. Syst. Softw..

[53]  Federica Sarro Search-Based Predictive Modelling for Software Engineering: How Far Have We Gone? , 2019, SSBSE.

[54]  Gavin Brown,et al.  "Good" and "Bad" Diversity in Majority Vote Ensembles , 2010, MCS.

[55]  Mohammad Azzeh A replicated assessment and comparison of adaptation techniques for analogy-based effort estimation , 2011, Empirical Software Engineering.

[56]  Passakorn Phannachitta,et al.  Robust comparison of similarity measures in analogy based software effort estimation , 2017, 2017 11th International Conference on Software, Knowledge, Information Management and Applications (SKIMA).

[57]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[58]  Douglas Fisher,et al.  Machine Learning Approaches to Estimating Software Development Effort , 1995, IEEE Trans. Software Eng..

[59]  François Chollet,et al.  Deep Learning with Python , 2017 .

[60]  Yufei Xia,et al.  A boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring , 2017, Expert Syst. Appl..

[61]  Rand R. Wilcox,et al.  Modern Statistics for the Social and Behavioral Sciences: A Practical Introduction , 2011 .