Performance tuning for machine learning-based software development effort prediction models

Software development effort estimation is a critical activity of the project management process. In this study, machine learning algorithms were investigated in conjunction with feature transformation, feature selection, and parameter tuning techniques to estimate the development effort accurately and a new model was proposed as part of an expert system. We preferred the most general-purpose algorithms, applied parameter optimization technique (GridSearch), feature transformation techniques (binning and one-hot-encoding), and feature selection algorithm (principal component analysis). All the models were trained on the ISBSG datasets and implemented by using the scikit-learn package in the Python language. The proposed model uses a multilayer perceptron as its underlying algorithm, applies binning of the features to transform continuous features and one-hot-encoding technique to transform categorical data into numerical values as feature transformation techniques, does feature selection based on the principal component analysis method, and performs parameter tuning based on the GridSearch algorithm. We demonstrate that our effort prediction model mostly outperforms the other existing models in terms of prediction accuracy based on the mean absolute residual parameter.

[1]  Gavin Hackeling,et al.  Mastering Machine Learning With scikit-learn , 2014 .

[2]  Barry W. Boehm,et al.  Negative results for software effort estimation , 2016, Empirical Software Engineering.

[3]  Barry W. Boehm,et al.  Software Development Effort Estimation: Formal Models or Expert Judgment? , 2009, IEEE Software.

[4]  Barbara A. Kitchenham,et al.  Effort estimation using analogy , 1996, Proceedings of IEEE 18th International Conference on Software Engineering.

[5]  Barry W. Boehm,et al.  A constrained regression technique for cocomo calibration , 2008, ESEM '08.

[6]  J. Friedman Stochastic gradient boosting , 2002 .

[7]  Yong Hu,et al.  Systematic literature review of machine learning based software development effort estimation models , 2012, Inf. Softw. Technol..

[8]  Navdeep Kaur,et al.  Research patterns and trends in software effort estimation , 2017, Inf. Softw. Technol..

[9]  Emilia Mendes,et al.  Investigating the use of Support Vector Regression for web effort estimation , 2011, Empirical Software Engineering.

[10]  Martin J. Shepperd,et al.  Estimating Software Project Effort Using Analogies , 1997, IEEE Trans. Software Eng..

[11]  Ricardo Massa Ferreira Lima,et al.  GA-based method for feature selection and parameters optimization for machine learning regression applied to software effort estimation , 2010, Inf. Softw. Technol..

[12]  Christopher J. Lokan,et al.  The usage of ISBSG data fields in software effort estimation: A systematic mapping study , 2016, J. Syst. Softw..

[13]  Harris Papadopoulos,et al.  Feature Subset Selection for Software Cost Modelling and Estimation , 2012, ArXiv.

[14]  Mahmoud O. Elish Improved estimation of software project effort using multiple additive regression trees , 2009, Expert Syst. Appl..

[15]  Mark Harman,et al.  Multi-objective Software Effort Estimation , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[16]  Adriano Lorena Inácio de Oliveira,et al.  Estimation of software project effort with support vector regression , 2006, Neurocomputing.

[17]  Peter A. Whigham,et al.  A Baseline Model for Software Effort Estimation , 2015, TSEM.

[18]  Danny Ho,et al.  Neural Network Models for Software Development Effort Estimation: A Comparative Study , 2016 .

[19]  Bart Baesens,et al.  Data Mining Techniques for Software Effort Estimation: A Comparative Study , 2012, IEEE Transactions on Software Engineering.

[20]  Magne Jørgensen,et al.  Forecasting of software development work effort: Evidence on expert judgement and formal models , 2007 .

[21]  Ayse Basar Bener,et al.  Exploiting the Essential Assumptions of Analogy-Based Effort Estimation , 2012, IEEE Transactions on Software Engineering.

[22]  L Farr,et al.  FACTORS THAT AFFECT THE COST OF COMPUTER PROGRAMMING. VOLUME II. A QUANTITATIVE ANALYSIS , 1964 .

[23]  Emilia Mendes,et al.  A Comparative Study of Cost Estimation Models for Web Hypermedia Applications , 2003, Empirical Software Engineering.

[24]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Software effort prediction: a hyper-heuristic decision-tree based approach , 2013, SAC '13.

[25]  Taghi M. Khoshgoftaar,et al.  Can neural networks be easily interpreted in software cost estimation? , 2002, 2002 IEEE World Congress on Computational Intelligence. 2002 IEEE International Conference on Fuzzy Systems. FUZZ-IEEE'02. Proceedings (Cat. No.02CH37291).

[26]  Santanu Kumar Rath,et al.  Early stage software effort estimation using random forest technique based on use case points , 2016, IET Softw..

[27]  Danny Ho,et al.  A Treeboost Model for Software Effort Estimation Based on Use Case Points , 2012, 2012 11th International Conference on Machine Learning and Applications.

[28]  E A Nelson,et al.  MANAGEMENT HANDBOOK FOR THE ESTIMATION OF COMPUTER PROGRAMMING COSTS , 1967 .

[29]  Douglas Fisher,et al.  Machine Learning Approaches to Estimating Software Development Effort , 1995, IEEE Trans. Software Eng..

[30]  Rahil Sarikhani,et al.  Improvement of effort estimation accuracy in software projects using a feature selection approach , 2016 .

[31]  Sheng-Yu Huang,et al.  Research on Software Effort Estimation Combined with Genetic Algorithm and Support Vector Regression , 2011, 2011 International Symposium on Computer Science and Society.

[32]  T. Wright,et al.  Organizational Benchmarking Using the ISBSG Data Repository , 2001, IEEE Softw..

[33]  Magne Jørgensen,et al.  A Systematic Review of Software Development Cost Estimation Studies , 2007, IEEE Transactions on Software Engineering.

[34]  Rodrigo C. Barros,et al.  Predicting software maintenance effort through evolutionary-based decision trees , 2012, SAC '12.

[35]  Stephen G. MacDonell,et al.  Evaluating prediction systems in software project estimation , 2012, Inf. Softw. Technol..

[36]  Danny Ho,et al.  Towards an early software estimation using log-linear regression and a multilayer perceptron model , 2013, J. Syst. Softw..

[37]  Barbara A. Kitchenham,et al.  A Simulation Study of the Model Evaluation Criterion MMRE , 2003, IEEE Trans. Software Eng..