Hyperparameters tuning of ensemble model for software effort estimation

This article presents an effective method to improve estimation accuracy of software projects to a significant level by tuning the hyperparameters of the stacking ensemble model using evolutionary methods. Traditional and parametric methods for software effort estimation are mostly inaccurate due to bias and subjectivity. Machine Learning methods are found to be effective in dealing with bias and subjectivity issues, if the data is subjected to appropriate data pre-processing and feature extraction methods. Instead of employing a single machine learning model to estimate the software project effort, ensemble of learning models is deployed to improve the estimate. Accurate hyperparameters need to be determined to operate the ensemble model at optimised level and to reduce the errors. Hyperparameters setting is traditionally done manually according to the problem and dataset by trial and error, which is a cumbersome process. In this paper, two evolutionary approaches namely Particle Swarm Optimization (PSO) and Genetic Algorithms (GA) have been employed to tune the hyperparameters. ISBSG dataset has been used for constructing the stacking ensemble model, which is a heterogeneous dataset consisting of software project data from different countries and organizations. Experimental outcomes reveal that the accuracy of estimation is higher when the hyperparameters are tuned using PSO.

[1]  Yaming Wang,et al.  Study on Parameter Optimization for Support Vector Regression in Solving the Inverse ECG Problem , 2013, Comput. Math. Methods Medicine.

[2]  Adam Trendowicz,et al.  Software Project Effort Estimation: Foundations and Best Practice Guidelines for Success , 2014 .

[3]  Cheng-Lung Huang,et al.  A GA-based feature selection and parameters optimizationfor support vector machines , 2006, Expert Syst. Appl..

[4]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[5]  Magne Jørgensen,et al.  A Systematic Review of Software Development Cost Estimation Studies , 2007, IEEE Transactions on Software Engineering.

[6]  Min Xie,et al.  An empirical analysis of data preprocessing for machine learning-based software cost estimation , 2015, Inf. Softw. Technol..

[7]  Wei-Min Shen,et al.  Data Preprocessing and Intelligent Data Analysis , 1997, Intell. Data Anal..

[8]  Steve McConnell Software Estimation: Demystifying the Black Art , 2006 .

[9]  Li-Wei Chen,et al.  Integration of the grey relational analysis with genetic algorithm for software effort estimation , 2008, Eur. J. Oper. Res..

[10]  Hieu Pham,et al.  Optimizing Ensemble Weights and Hyperparameters of Machine Learning Models for Regression Problems , 2019, Machine Learning with Applications.

[11]  Adam Trendowicz,et al.  Software Project Effort Estimation , 2014, Springer International Publishing.

[12]  Beata Czarnacka-Chrobot,et al.  An effective approach for software project effort and duration estimation with machine learning algorithms , 2018, J. Syst. Softw..

[13]  Bertrand Clarke,et al.  Principles and Theory for Data Mining and Machine Learning , 2009 .

[14]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[15]  Bart De Moor,et al.  Hyperparameter Search in Machine Learning , 2015, ArXiv.

[16]  Alain Abran,et al.  Improved estimation of software development effort using Classical and Fuzzy Analogy ensembles , 2016, Appl. Soft Comput..

[17]  Xin Yao,et al.  Time complexity of evolutionary algorithms for combinatorial optimization: A decade of results , 2007, Int. J. Autom. Comput..

[18]  H. E. Dunsmore,et al.  Software engineering metrics and models , 1986 .

[19]  R. Ponnusamy,et al.  Software quality analysis based on cost and error using fuzzy combined COCOMO model , 2020 .

[20]  José Ranilla,et al.  Particle swarm optimization for hyper-parameter selection in deep neural networks , 2017, GECCO.

[21]  Roman Neruda,et al.  Evolving Non-Linear Stacking Ensembles for Prediction of Go Player Attributes , 2015, 2015 IEEE Symposium Series on Computational Intelligence.

[22]  Aaron Klein,et al.  Efficient and Robust Automated Machine Learning , 2015, NIPS.

[23]  Lefteris Angelis,et al.  Using Ensembles for Web Effort Estimation , 2013, 2013 ACM / IEEE International Symposium on Empirical Software Engineering and Measurement.

[24]  Robert K. Wysocki,et al.  Effective Project Management: Traditional, Adaptive, Extreme , 2000 .

[25]  Magne Jørgensen,et al.  A Systematic Review of Software Development Cost Estimation Studies , 2007 .

[26]  Alípio Mário Jorge,et al.  Ensemble approaches for regression: A survey , 2012, CSUR.

[27]  L. Breiman Stacked Regressions , 1996, Machine Learning.

[28]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[29]  Anjali Gupta,et al.  Optimization of stacking ensemble Configuration based on various metahueristic algorithms , 2014, 2014 IEEE International Advance Computing Conference (IACC).

[30]  Shih-Wei Lin,et al.  Particle swarm optimization for parameter determination and feature selection of support vector machines , 2008, Expert Syst. Appl..

[31]  Yong Hu,et al.  Systematic literature review of machine learning based software development effort estimation models , 2012, Inf. Softw. Technol..

[32]  Sotiris B. Kotsiantis,et al.  Data preprocessing in predictive data mining , 2019, The Knowledge Engineering Review.

[33]  Haibing Li,et al.  Applying Ant Colony Optimization to configuring stacking ensembles for data mining , 2014, Expert Syst. Appl..

[34]  Alain Abran,et al.  Neural networks for predicting the duration of new software projects , 2015, J. Syst. Softw..