On the application of search-based techniques for software engineering predictive modeling: A systematic review and future directions

Abstract Software engineering predictive modeling involves construction of models, with the help of software metrics, for estimating quality attributes. Recently, the use of search-based techniques have gained importance as they help the developers and project-managers in the identification of optimal solutions for developing effective prediction models. In this paper, we perform a systematic review of 78 primary studies from January 1992 to December 2015 which analyze the predictive capability of search-based techniques for ascertaining four predominant software quality attributes, i.e., effort, defect proneness, maintainability and change proneness . The review analyses the effective use and application of search-based techniques by evaluating appropriate specifications of fitness functions, parameter settings, validation methods, accounting for their stochastic natures and the evaluation of developmental models with the use of well-known statistical tests. Furthermore, we compare the effectiveness of different models, developed using the various search-based techniques amongst themselves, and also with the prevalent machine learning techniques used in literature. Although there are very few studies which use search-based techniques for predicting maintainability and change proneness, we found that the results of the application of search-based techniques for effort estimation and defect prediction are encouraging. Hence, this comprehensive study and the associated results will provide guidelines to practitioners and researchers and will enable them to make proper choices for applying the search-based techniques to their specific situations.

[1]  Cong Jin,et al.  Prediction approach of software fault-proneness based on hybrid artificial neural network and quantum particle swarm optimization , 2015, Appl. Soft Comput..

[2]  Ömer Faruk Arar,et al.  Software defect prediction using cost-sensitive neural network , 2015, Appl. Soft Comput..

[3]  José Javier Dolado,et al.  On the problem of the software cost function , 2001, Inf. Softw. Technol..

[4]  Ricardo Massa Ferreira Lima,et al.  GA-based method for feature selection and parameters optimization for machine learning regression applied to software effort estimation , 2010, Inf. Softw. Technol..

[5]  Yigang Wang,et al.  A new software maintainability evaluation model based on multiple classifiers combination , 2013, 2013 International Conference on Quality, Reliability, Risk, Maintenance, and Safety Engineering (QR2MSE).

[6]  Daryl Essam,et al.  Software project effort estimation using genetic programming , 2002, IEEE 2002 International Conference on Communications, Circuits and Systems and West Sino Expositions.

[7]  Q. Lu,et al.  Software defect prediction using fuzzy integral fusion based on GA-FM , 2014, Wuhan University Journal of Natural Sciences.

[8]  Yuanyuan Zhang,et al.  Achievements, Open Problems and Challenges for Search Based Software Testing , 2015, 2015 IEEE 8th International Conference on Software Testing, Verification and Validation (ICST).

[9]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[10]  Ruchika Malhotra,et al.  A new metric for predicting software change using gene expression programming , 2014, WETSoM 2014.

[11]  Taghi M. Khoshgoftaar,et al.  Genetic programming model for software quality classification , 2001, Proceedings Sixth IEEE International Symposium on High Assurance Systems Engineering. Special Topic: Impact of Networking.

[12]  Li Juelong,et al.  A new model for software defect prediction using Particle Swarm Optimization and support vector machine , 2013, 2013 25th Chinese Control and Decision Conference (CCDC).

[13]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007 .

[14]  Siti Zaiton Mohd Hashim,et al.  A flexible method to estimate the software development effort based on the classification of projects and localization of comparisons , 2013, Empirical Software Engineering.

[15]  Alaa F. Sheta,et al.  Estimation of the COCOMO Model Parameters Using Genetic Algorithms for NASA Software Projects , 2006 .

[16]  Thong Ngee Goh,et al.  A study of project selection and feature weighting for analogy based software cost estimation , 2009, J. Syst. Softw..

[17]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Software effort prediction: a hyper-heuristic decision-tree based approach , 2013, SAC '13.

[18]  K. Anuradha,et al.  Adaptive PSO Based Association Rule Mining Technique for Software Defect Classification Using ANN , 2015 .

[19]  Mark Harman,et al.  The relationship between search based software engineering and predictive modeling , 2010, PROMISE '10.

[20]  Tarun Kumar Rawat,et al.  Bat Algorithm: Application to Adaptive Infinite Impulse Response System Identification , 2016, Arabian Journal for Science and Engineering.

[21]  Silvio Romero de Lemos Meira,et al.  An evolutionary morphological approach for software development cost estimation , 2012, Neural Networks.

[22]  Xin Yao,et al.  An analysis of multi-objective evolutionary algorithms for training ensemble models based on different performance measures in software effort estimation , 2013, PROMISE.

[23]  Jesús S. Aguilar-Ruiz,et al.  Searching for rules to detect defective modules: A subgroup discovery approach , 2012, Inf. Sci..

[24]  Yuanyuan Zhang,et al.  Search based software engineering for software product line engineering: a survey and directions for future work , 2014, SPLC.

[25]  Shadi Banitaan,et al.  A Better Case Adaptation Method for Case-Based Effort Estimation Using Multi-objective Optimization , 2014, 2014 13th International Conference on Machine Learning and Applications.

[26]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  A grammatical evolution approach for software effort estimation , 2013, GECCO '13.

[27]  F. Ahmed,et al.  Integrating Function Point Project Information for Improving the Accuracy of Effort Estimation , 2008, 2008 The Second International Conference on Advanced Engineering Computing and Applications in Sciences.

[28]  Filomena Ferrucci,et al.  Genetic Programming for Effort Estimation: An Analysis of the Impact of Different Fitness Functions , 2010, 2nd International Symposium on Search Based Software Engineering.

[29]  Sheng-Yu Huang,et al.  Research on Software Effort Estimation Combined with Genetic Algorithm and Support Vector Regression , 2011, 2011 International Symposium on Computer Science and Society.

[30]  Alaa F. Sheta,et al.  Software effort estimation by tuning COOCMO model parameters using differential evolution , 2010, ACS/IEEE International Conference on Computer Systems and Applications - AICCSA 2010.

[31]  Lionel C. Briand,et al.  A Systematic Review of the Application and Empirical Investigation of Search-Based Test Case Generation , 2010, IEEE Transactions on Software Engineering.

[32]  Colin J Burgess,et al.  Can genetic programming improve software effort estimation? A comparative evaluation , 2001, Inf. Softw. Technol..

[33]  Ruchika Malhotra,et al.  Application of Evolutionary Algorithms for Software Maintainability Prediction using Object-Oriented Metrics , 2014, BICT.

[34]  Georgios Dounias,et al.  Predicting Defects in Software Using Grammar-Guided Genetic Programming , 2008, SETN.

[35]  M. B. Abdul Hamid,et al.  Short Term Load Forecasting Using an Artificial Neural Network Trained by Artificial Immune System Learning Algorithm , 2010, 2010 12th International Conference on Computer Modelling and Simulation.

[36]  Taghi M. Khoshgoftaar,et al.  Reducing overfitting in genetic programming models for software quality classification , 2004, Eighth IEEE International Symposium on High Assurance Systems Engineering, 2004. Proceedings..

[37]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[38]  Carl G. Davis,et al.  A Hierarchical Model for Object-Oriented Design Quality Assessment , 2002, IEEE Trans. Software Eng..

[39]  Tarun Kumar Rawat,et al.  Optimal Design of 2D FIR Filters with Quadrantally Symmetric Properties Using Fractional Derivative Constraints , 2016, Circuits Syst. Signal Process..

[40]  Arvinder Kaur,et al.  Prediction of Software Quality Model Using Gene Expression Programming , 2009, PROFES.

[41]  Martin J. Shepperd,et al.  Search Heuristics, Case-based Reasoning And Software Project Effort Prediction , 2002, GECCO.

[42]  Sun-Jen Huang,et al.  The adjusted analogy-based software effort estimation based on similarity distances , 2007, J. Syst. Softw..

[43]  Zhi-hui Zhan,et al.  Renumber strategy enhanced particle swarm optimization for cloud computing resource scheduling , 2015, 2015 IEEE Congress on Evolutionary Computation (CEC).

[44]  Isabel M. Ramos,et al.  An evolutionary approach to estimating software development projects , 2001, Inf. Softw. Technol..

[45]  Taghi M. Khoshgoftaar,et al.  Using Genetic Programming to Determine Software Quality , 1999, FLAIRS.

[46]  Lean Yu,et al.  An evolutionary programming based asymmetric weighted least squares support vector machine ensemble learning methodology for software repository mining , 2012, Inf. Sci..

[47]  Sun-Jen Huang,et al.  Optimization of analogy weights by genetic algorithm for software effort estimation , 2006, Inf. Softw. Technol..

[48]  Ruchika Malhotra,et al.  Search based techniques for software fault prediction: current trends and future directions , 2014, SBST 2014.

[49]  Bart Baesens,et al.  Mining software repositories for comprehensible software fault prediction models , 2008, J. Syst. Softw..

[50]  Christopher J. Lokan,et al.  What should you optimize when building an estimation model? , 2005, 11th IEEE International Software Metrics Symposium (METRICS'05).

[51]  Mark Harman,et al.  Genetic programming for Reverse Engineering , 2013, 2013 20th Working Conference on Reverse Engineering (WCRE).

[52]  Ali Selamat,et al.  A survey on software fault detection based on different prediction approaches , 2014, Vietnam Journal of Computer Science.

[53]  Cong Jin,et al.  Software Fault Prediction Model Based on Adaptive Dynamical and Median Particle Swarm Optimization , 2010, 2010 Second International Conference on Multimedia and Information Technology.

[54]  Lionel C. Briand,et al.  A practical guide for using statistical tests to assess randomized algorithms in software engineering , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[55]  Linda Di Geronimo,et al.  A Parallel Genetic Algorithm Based on Hadoop MapReduce for the Automatic Generation of JUnit Test Suites , 2012, 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation.

[56]  Kaushal K. Shukla,et al.  Neuro-genetic prediction of software development effort , 2000, Inf. Softw. Technol..

[57]  Rodrigo A. Vivanco,et al.  Finding Effective Software Metrics to Classify Maintainability Using a Parallel Genetic Algorithm , 2004, GECCO.

[58]  Aurora Trinidad Ramirez Pozo,et al.  A symbolic fault-prediction model based on multiobjective particle swarm optimization , 2010, J. Syst. Softw..

[59]  H. E. Dunsmore,et al.  Software engineering metrics and models , 1986 .

[60]  Thong Ngee Goh,et al.  A study of mutual information based feature selection for case based reasoning in software cost estimation , 2009, Expert Syst. Appl..

[61]  K. Kaminsky,et al.  Building a genetically engineerable evolvable program (GEEP) using breadth-based explicit knowledge for predicting software defects , 2004, IEEE Annual Meeting of the Fuzzy Information, 2004. Processing NAFIPS '04..

[62]  Taghi M. Khoshgoftaar,et al.  Using the genetic algorithm to build optimal neural networks for fault-prone module detection , 1996, Proceedings of ISSRE '96: 7th International Symposium on Software Reliability Engineering.

[63]  Georgios Dounias,et al.  Application of Genetic Programming in Software Engineering Empirical Data Modelling , 2008, ICSOFT.

[64]  Silvio Romero de Lemos Meira,et al.  A GA-based feature selection and parameters optimization for support vector regression applied to software effort estimation , 2008, SAC '08.

[65]  Filomena Ferrucci,et al.  A Genetic Algorithm to Configure Support Vector Machines for Predicting Fault-Prone Components , 2011, PROFES.

[66]  Akito Monden,et al.  Using search-based metric selection and oversampling to predict fault prone modules , 2010, CCECE 2010.

[67]  Emilia Mendes,et al.  Using tabu search to configure support vector regression for effort estimation , 2013, Empirical Software Engineering.

[68]  Mark Harman,et al.  Search Based Software Engineering: Techniques, Taxonomy, Tutorial , 2010, LASER Summer School.

[69]  Aurora Trinidad Ramirez Pozo,et al.  Exploring machine learning techniques for software size estimation , 2003, 23rd International Conference of the Chilean Computer Science Society, 2003. SCCC 2003. Proceedings..

[70]  Santanu Kumar Rath,et al.  Validating the Effectiveness of Object-Oriented Metrics for Predicting Maintainability☆ , 2015 .

[71]  Xiuzhen Zhang,et al.  Comments on "Data Mining Static Code Attributes to Learn Defect Predictors" , 2007, IEEE Trans. Software Eng..

[72]  Aurora Trinidad Ramirez Pozo,et al.  Predicting Fault Proneness of Classes Trough a Multiobjective Particle Swarm Optimization Algorithm , 2008, 2008 20th IEEE International Conference on Tools with Artificial Intelligence.

[73]  Mark Harman,et al.  Search-Based Software Project Management , 2014, Software Project Management in a Changing World.

[74]  John A. Clark,et al.  Formulating software engineering as a search problem , 2003, IEE Proc. Softw..

[75]  Ruchika Malhotra,et al.  A systematic review of machine learning techniques for software fault prediction , 2015, Appl. Soft Comput..

[76]  Arturo Chavoya,et al.  Applying Genetic Programming for Estimating Software Development Effort of Short-scale Projects , 2011, 2011 Eighth International Conference on Information Technology: New Generations.

[77]  Xin Yao,et al.  Software effort estimation as a multiobjective learning problem , 2013, TSEM.

[78]  Tarun Kumar Rawat,et al.  Optimal fractional delay-IIR filter design using cuckoo search algorithm. , 2015, ISA transactions.

[79]  Filomena Ferrucci,et al.  Single and Multi Objective Genetic Programming for software development effort estimation , 2012, SAC '12.

[80]  Mark Harman,et al.  Less is More: Temporal Fault Predictive Performance over Multiple Hadoop Releases , 2014, SSBSE.

[81]  Ruchika Malhotra,et al.  Mining the impact of object oriented metrics for change prediction using Machine Learning and Search-based techniques , 2015, 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[82]  Farhad Soleimanian Gharehchopogh,et al.  A new approach by using Tabu search and genetic algorithms in Software Cost estimation , 2015, 2015 9th International Conference on Application of Information and Communication Technologies (AICT).

[83]  Mark Harman,et al.  Why the Virtual Nature of Software Makes It Ideal for Search Based Optimization , 2010, FASE.

[84]  Gerardo Canfora,et al.  Multi-objective Cross-Project Defect Prediction , 2013, 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation.

[85]  Filomena Ferrucci,et al.  A further analysis on the use of Genetic Algorithm to configure Support Vector Machines for inter-release fault prediction , 2012, SAC '12.

[86]  Wasif Afzal,et al.  Using Faults-Slip-Through Metric as a Predictor of Fault-Proneness , 2010, 2010 Asia Pacific Software Engineering Conference.

[87]  Nan-Hsing Chiu,et al.  Combining techniques for software quality classification: An integrated decision network approach , 2011, Expert Syst. Appl..

[88]  Ruchika Malhotra,et al.  The Ability of Search-Based Algorithms to Predict Change-Prone Classes , 2014 .

[89]  Zhang Dan Improving the accuracy in software effort estimation: Using artificial neural network model based on particle swarm optimization , 2013, Proceedings of 2013 IEEE International Conference on Service Operations and Logistics, and Informatics.

[90]  Tarun Kumar Sharma,et al.  Halton Based Initial Distribution in Artificial Bee Colony Algorithm and Its Application in Software Effort Estimation , 2011, 2011 Sixth International Conference on Bio-Inspired Computing: Theories and Applications.

[91]  Tarun Kumar Rawat,et al.  Optimal design of FIR fractional order differentiator using cuckoo search algorithm , 2015, Expert Syst. Appl..

[92]  Martin J. Shepperd,et al.  Using Genetic Programming to Improve Software Effort Estimation Based on General Data Sets , 2003, GECCO.

[93]  Taghi M. Khoshgoftaar,et al.  Genetic programming-based decision trees for software quality classification , 2003, Proceedings. 15th IEEE International Conference on Tools with Artificial Intelligence.

[94]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[95]  Phil McMinn,et al.  Search‐based software test data generation: a survey , 2004, Softw. Test. Verification Reliab..

[96]  Ruchika Malhotra,et al.  Comparative analysis of statistical and machine learning methods for predicting faulty modules , 2014, Appl. Soft Comput..

[97]  Parag C. Pendharkar,et al.  Exhaustive and heuristic search approaches for learning a software defect prediction model , 2010, Eng. Appl. Artif. Intell..

[98]  David R. White,et al.  Cloud Computing and SBSE , 2013, SSBSE.

[99]  Siti Zaiton Mohd Hashim,et al.  A PSO-based model to increase the accuracy of software development effort estimation , 2012, Software Quality Journal.

[100]  Mark Harman,et al.  Search-based software engineering , 2001, Inf. Softw. Technol..

[101]  Ivar Jacobson,et al.  Object-Oriented Software Engineering , 1991, TOOLS.

[102]  Wasif Afzal,et al.  On the application of genetic programming for software engineering predictive modeling: A systematic review , 2011, Expert Syst. Appl..

[103]  Alaa F. Sheta,et al.  Evaluating software cost estimation models using particle swarm optimisation and fuzzy logic for NASA projects: a comparative study , 2010, Int. J. Bio Inspired Comput..

[104]  Emilia Mendes,et al.  Investigating Tabu Search for Web Effort Estimation , 2010, 2010 36th EUROMICRO Conference on Software Engineering and Advanced Applications.

[105]  J.K. Chhabra,et al.  Use of genetic algorithm for software maintainability metrics' conditioning , 2007, 15th International Conference on Advanced Computing and Communications (ADCOM 2007).

[106]  Li-Wei Chen,et al.  Integration of the grey relational analysis with genetic algorithm for software effort estimation , 2008, Eur. J. Oper. Res..

[107]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[108]  Yong Hu,et al.  Systematic literature review of machine learning based software development effort estimation models , 2012, Inf. Softw. Technol..

[109]  H. Zulzalil,et al.  Applying evolution programming Search Based Software Engineering (SBSE) in selecting the best open source software maintainability metrics , 2012, 2012 International Symposium on Computer Applications and Industrial Electronics (ISCAIE).

[110]  Filomena Ferrucci,et al.  Estimating Software Development Effort using Tabu Search , 2010, ICEIS.

[111]  Ruchika Malhotra,et al.  Fault Prediction Using Statistical and Machine Learning Methods for Improving Software Quality , 2012, J. Inf. Process. Syst..

[112]  José Javier Dolado,et al.  A Validation of the Component-Based Method for Software Size Estimation , 2000, IEEE Trans. Software Eng..

[113]  Mohammad Alshayeb,et al.  Hybrid Intelligent Model for Software Maintenance Prediction , 2013 .

[114]  Zhenyou Li Intelligently Predict Project Effort by Reduced Models Based on Multiple Regressions and Genetic Algorithms with Neural Networks , 2010, 2010 International Conference on E-Business and E-Government.

[115]  Ajith Abraham,et al.  Hybrid Evolutionary Algorithms: Methodologies, Architectures, and Reviews , 2007 .

[116]  Danielle Azar A Genetic Algorithm for Improving Accuracy of Software Quality Predictive Models: a Search-Based Software Engineering Approach , 2010, Int. J. Comput. Intell. Appl..

[117]  Rodrigo C. Barros,et al.  Predicting software maintenance effort through evolutionary-based decision trees , 2012, SAC '12.

[118]  Banu Diri,et al.  Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem , 2009, Inf. Sci..

[119]  Taghi M. Khoshgoftaar,et al.  Evolutionary Optimization of Software Quality Modeling with Multiple Repositories , 2010, IEEE Transactions on Software Engineering.

[120]  Georgios Dounias,et al.  Deriving Models for Software Project Effort Estimation by Means of Genetic Programming , 2009, KDIR.

[121]  Danielle Azar,et al.  An ant colony optimization algorithm to improve software quality prediction models: Case of class stability , 2011, Inf. Softw. Technol..

[122]  Taghi M. Khoshgoftaar,et al.  Evolutionary neural networks: a robust approach to software reliability problems , 1997, Proceedings The Eighth International Symposium on Software Reliability Engineering.

[123]  Alaa F. Sheta,et al.  A GP effort estimation model utilizing line of code and methodology for NASA software projects , 2010, 2010 10th International Conference on Intelligent Systems Design and Applications.