A genetic algorithm based framework for software effort prediction

BackgroundSeveral prediction models have been proposed in the literature using different techniques obtaining different results in different contexts. The need for accurate effort predictions for projects is one of the most critical and complex issues in the software industry. The automated selection and the combination of techniques in alternative ways could improve the overall accuracy of the prediction models.ObjectivesIn this study, we validate an automated genetic framework, and then conduct a sensitivity analysis across different genetic configurations. Following is the comparison of the framework with a baseline random guessing and an exhaustive framework. Lastly, we investigate the performance results of the best learning schemes.MethodsIn total, six hundred learning schemes that include the combination of eight data preprocessors, five attribute selectors and fifteen modeling techniques represent our search space. The genetic framework, through the elitism technique, selects the best learning schemes automatically. The best learning scheme in this context means the combination of data preprocessing + attribute selection + learning algorithm with the highest coefficient correlation possible. The selected learning schemes are applied to eight datasets extracted from the ISBSG R12 Dataset.ResultsThe genetic framework performs as good as an exhaustive framework. The analysis of the standardized accuracy (SA) measure revealed that all best learning schemes selected by the genetic framework outperforms the baseline random guessing by 45–80%. The sensitivity analysis confirms the stability between different genetic configurations.ConclusionsThe genetic framework is stable, performs better than a random guessing approach, and is as good as an exhaustive framework. Our results confirm previous ones in the field, simple regression techniques with transformations could perform as well as nonlinear techniques, and ensembles of learning machines techniques such as SMO, M5P or M5R could optimize effort predictions.

[1]  Martin J. Shepperd,et al.  Software project economics: a roadmap , 2007, Future of Software Engineering (FOSE '07).

[2]  Christian Quesada-López,et al.  COSMIC base functional components in Functional Size based effort estimation models , 2016, 2016 IEEE 36th Central American and Panama Convention (CONCAPAN XXXVI).

[3]  Christian Quesada-López,et al.  Function Point Structure and Applicability: A Replicated Study , 2016, J. Object Technol..

[4]  Magne Jørgensen,et al.  A Systematic Review of Software Development Cost Estimation Studies , 2007, IEEE Transactions on Software Engineering.

[5]  Tim Menzies,et al.  Beyond evolutionary algorithms for search-based software engineering , 2017, Inf. Softw. Technol..

[6]  A. Sharma,et al.  A comparative study of modified crossover operators , 2015, 2015 Third International Conference on Image Information Processing (ICIIP).

[7]  Ian H. Witten,et al.  Induction of model trees for predicting continuous classes , 1996 .

[8]  Emilia Mendes,et al.  Replicating studies on cross- vs single-company effort models using the ISBSG Database , 2008, Empirical Software Engineering.

[9]  G P R dei Encyclopedia of genetics, genomics, proteomics, and informatics , 2008 .

[10]  Emilia Mendes,et al.  A replicated comparison of cross-company and within-company effort estimation models using the ISBSG database , 2005, 11th IEEE International Software Metrics Symposium (METRICS'05).

[11]  Magne Jørgensen,et al.  A review of studies on expert estimation of software development effort , 2004, J. Syst. Softw..

[12]  Stephen G. MacDonell,et al.  Combining techniques to optimize effort predictions in software project management , 2003, J. Syst. Softw..

[13]  Lefteris Angelis,et al.  Comparing cost prediction models by resampling techniques , 2008, J. Syst. Softw..

[14]  Christian Quesada-López,et al.  An Empirical Validation of Function Point Structure and Applicability: A Replication Study , 2015, CIbSE.

[15]  Carolyn Mair,et al.  The consistency of empirical comparisons of regression and analogy-based software project cost prediction , 2005, 2005 International Symposium on Empirical Software Engineering, 2005..

[16]  Mark Harman,et al.  Evaluation of estimation models using the Minimum Interval of Equivalence , 2016, Appl. Soft Comput..

[17]  Rubén Fuentes-Fernández,et al.  An Empirical Validation of Learning Schemes Using an Automated Genetic Defect Prediction Framework , 2016, IBERAMIA.

[18]  Sandro Morasca,et al.  Towards a simplified definition of Function Points , 2013, Inf. Softw. Technol..

[19]  Tim Menzies,et al.  Special issue on repeatable results in software engineering prediction , 2012, Empirical Software Engineering.

[20]  Xin Yao,et al.  The impact of parameter tuning on software effort estimation using learning machines , 2013, PROMISE.

[21]  Tim Menzies,et al.  Finding conclusion stability for selecting the best effort predictor in software effort estimation , 2012, Automated Software Engineering.

[22]  Mark Harman,et al.  Search-based software engineering , 2001, Inf. Softw. Technol..

[23]  Mark Harman,et al.  Exact Mean Absolute Error of Baseline Predictor, MARP0 , 2016, Inf. Softw. Technol..

[24]  Christian Quesada-López,et al.  Function point structure and applicability validation using the ISBSG dataset: a replicated study , 2014, ESEM '14.

[25]  Qinbao Song,et al.  A General Software Defect-Proneness Prediction Framework , 2011, IEEE Transactions on Software Engineering.

[26]  Chia-Mei Chen,et al.  A Specific Effort Estimation Method Using Function Point , 2011, J. Inf. Sci. Eng..

[27]  Magne Jørgensen,et al.  A Systematic Review of Software Development Cost Estimation Studies , 2007 .

[28]  Ross Jeffery,et al.  AREION: Software effort estimation based on multiple regressions with adaptive recursive data partitioning , 2013, Inf. Softw. Technol..

[29]  R. Sakia The Box-Cox transformation technique: a review , 1992 .

[30]  Emilia Mendes,et al.  Cross-company and single-company effort models using the ISBSG database: a further replicated study , 2006, ISESE '06.

[31]  Rubén Fuentes-Fernández,et al.  An Automated Defect Prediction Framework using Genetic Algorithms: A Validation of Empirical Studies , 2016 .

[32]  Zhenyu Yang,et al.  Genetic and Evolutionary Computation Conference (GECCO-2008) , 2008, GECCO 2008.

[33]  Alaa F. Sheta,et al.  Evolving Software Effort Estimation Models Using Multigene Symbolic Regression Genetic Programming , 2013 .

[34]  Barry W. Boehm,et al.  Software Engineering Economics , 1993, IEEE Transactions on Software Engineering.

[35]  Stephen G. MacDonell,et al.  Evaluating prediction systems in software project estimation , 2012, Inf. Softw. Technol..

[36]  Geoff Holmes,et al.  Generating Rule Sets from Model Trees , 1999, Australian Joint Conference on Artificial Intelligence.

[37]  Bart Baesens,et al.  Data Mining Techniques for Software Effort Estimation: A Comparative Study , 2012, IEEE Transactions on Software Engineering.

[38]  Min Xie,et al.  An empirical analysis of data preprocessing for machine learning-based software cost estimation , 2015, Inf. Softw. Technol..

[39]  Barry G. Becker Visualizing decision table classifiers , 1998, Proceedings IEEE Symposium on Information Visualization (Cat. No.98TB100258).

[40]  John A. Clark,et al.  Dynamic adaptive Search Based Software Engineering , 2012, Proceedings of the 2012 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement.

[41]  Gordon Fraser,et al.  On Parameter Tuning in Search Based Software Engineering , 2011, SSBSE.

[42]  Xin Yao,et al.  journal homepage: www.elsevier.com/locate/infsof Ensembles and locality: Insight on improving software effort estimation , 2022 .

[43]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[44]  Mark Harman,et al.  The Current State and Future of Search Based Software Engineering , 2007, Future of Software Engineering (FOSE '07).

[45]  Fernando González-Ladrón-de-Guevara,et al.  ISBSG variables most frequently used for software effort estimation: a mapping review , 2014, ESEM '14.

[46]  Katerina Goseva-Popstojanova,et al.  On Parameter Tuning in Search Based Software Engineering: A Replicated Empirical Study , 2013, 2013 3rd International Workshop on Replication in Empirical Software Engineering Research.

[47]  Martin J. Shepperd,et al.  Using Genetic Programming to Improve Software Effort Estimation Based on General Data Sets , 2003, GECCO.

[48]  Kjetil Moløkken-Østvold,et al.  A review of software surveys on software effort estimation , 2003, 2003 International Symposium on Empirical Software Engineering, 2003. ISESE 2003. Proceedings..

[49]  Christian Quesada-López,et al.  An Empirical Validation of an Automated Genetic Software Effort Prediction Framework using the ISBSG Dataset , 2016, CIbSE.

[50]  Juan Murillo An Automated Defect Prediction Framework using Genetic Algorithms: A Validation of Empirical Studies , 2016 .

[51]  G. Rédei,et al.  Encyclopedia of Genetics, Genomics, Proteomics, and Informatics , 2008 .

[52]  J. R. Quinlan Learning With Continuous Classes , 1992 .

[53]  Hossam Faris,et al.  Optimizing Software Effort Estimation Models Using Firefly Algorithm , 2015, ArXiv.

[54]  Brajesh Kumar Singh,et al.  Software Effort Estimation by Genetic Algorithm Tuned Parameters of Modified Constructive Cost Model for NASA Software Projects , 2012 .

[55]  Ruchika Malhotra,et al.  Comparative analysis of statistical and machine learning methods for predicting faulty modules , 2014, Appl. Soft Comput..

[56]  Yong Hu,et al.  Systematic literature review of machine learning based software development effort estimation models , 2012, Inf. Softw. Technol..