Iterated feature selection algorithms with layered recurrent neural network for software fault prediction

Abstract Software fault prediction (SFP) is typically used to predict faults in software components. Machine learning techniques (e.g., classification) are widely used to tackle this problem. With the availability of the huge amount of data that can be obtained from mining software historical repositories, it becomes possible to have some features (metrics) that are not correlated with the faults, which consequently mislead the learning algorithm and thus decrease its performance. One possible solution to eliminate those metrics is Feature Selection (FS). In this paper, a novel FS approach is proposed to enhance the performance of a layered recurrent neural network (L-RNN), which is used as a classification technique for the SFP problem. Three different wrapper FS algorithms (i.e, Binary Genetic Algorithm (BGA), Binary Particle Swarm Optimization (BPSO), and Binary Ant Colony Optimization (BACO)) were employed iteratively. To assess the performance of the proposed approach, 19 real-world software projects from PROMISE repository are investigated and the experimental results are discussed. Receiver operating characteristic - area under the curve (ROC-AUC) is used as a performance measure. The results are compared with other state-of-art approaches including Naive Bayes (NB), Artificial Neural Network (ANN), logistic regression (LR), the k-nearest neighbors (k-NN) and C4.5 decision trees, in terms of area under the curve (AUC). Our results have demonstrated that the proposed approach can outperform other existing methods.

[1]  D. Hosmer,et al.  Applied Logistic Regression , 1991 .

[2]  Ebru Akcapinar Sezer,et al.  A comparison of some soft computing methods for software fault prediction , 2015, Expert Syst. Appl..

[3]  Thomas Zimmermann,et al.  Predicting Bugs from History , 2008, Software Evolution.

[4]  John Grundy,et al.  Systematic literature reviews in agile software development: A tertiary study , 2017, Inf. Softw. Technol..

[5]  Mohammad Alshayeb,et al.  An Empirical Validation of Object-Oriented Metrics in Two Different Iterative Software Processes , 2003, IEEE Trans. Software Eng..

[6]  Xiang Chen,et al.  Empirical Studies of a Two-Stage Data Preprocessing Approach for Software Fault Prediction , 2014, IEEE Transactions on Reliability.

[7]  Shane McIntosh,et al.  Revisiting the Impact of Classification Techniques on the Performance of Defect Prediction Models , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[8]  B. Chandra Mohan,et al.  A survey: Ant Colony Optimization based recent research and implementation on several engineering domain , 2012, Expert Syst. Appl..

[9]  W. W. Royce,et al.  Managing the development of large software systems: concepts and techniques , 1987, ICSE '87.

[10]  Ebru Akcapinar Sezer,et al.  Iterative software fault prediction with a hybrid approach , 2016, Appl. Soft Comput..

[11]  Stavros Stavru,et al.  A critical examination of recent industrial surveys on agile method usage , 2014, J. Syst. Softw..

[12]  Olcay Taner Yildiz,et al.  Software defect prediction using Bayesian networks , 2012, Empirical Software Engineering.

[13]  Adam Prügel-Bennett,et al.  Benefits of a Population: Five Mechanisms That Advantage Population-Based Algorithms , 2010, IEEE Transactions on Evolutionary Computation.

[14]  Michael R. Lyu,et al.  A novel method for early software quality prediction based on support vector machine , 2005, 16th IEEE International Symposium on Software Reliability Engineering (ISSRE'05).

[15]  Ruchika Malhotra,et al.  A systematic review of machine learning techniques for software fault prediction , 2015, Appl. Soft Comput..

[16]  Taghi M. Khoshgoftaar,et al.  An application of fuzzy clustering to software quality prediction , 2000, Proceedings 3rd IEEE Symposium on Application-Specific Systems and Software Engineering Technology.

[17]  Selma Ayse Özel,et al.  A hybrid approach of differential evolution and artificial bee colony for feature selection , 2016, Expert Syst. Appl..

[18]  Mineichi Kudo,et al.  Comparison of algorithms that select features for pattern classifiers , 2000, Pattern Recognit..

[19]  Michele Lanza,et al.  An extensive comparison of bug prediction approaches , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[20]  Lech Madeyski,et al.  Towards identifying software project clusters with regard to defect prediction , 2010, PROMISE '10.

[21]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[22]  G. Di Caro,et al.  Ant colony optimization: a new meta-heuristic , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[23]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[24]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[25]  Manuel P. Cuéllar,et al.  Energy consumption forecasting based on Elman neural networks with evolutive optimization , 2018, Expert Syst. Appl..

[26]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007 .

[27]  Sitian Qin,et al.  A One-Layer Recurrent Neural Network for Pseudoconvex Optimization Problems With Equality and Inequality Constraints , 2017, IEEE Transactions on Cybernetics.

[28]  Sallie M. Henry,et al.  Object-oriented metrics that predict maintainability , 1993, J. Syst. Softw..

[29]  Tracy Hall,et al.  A Systematic Literature Review on Fault Prediction Performance in Software Engineering , 2012, IEEE Transactions on Software Engineering.

[30]  A. Roy,et al.  Software fault prediction using neuro-fuzzy network and evolutionary learning approach , 2017, Neural Computing and Applications.

[31]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[32]  Orit Hazzan,et al.  The Agile Manifesto , 2014 .

[33]  Taghi M. Khoshgoftaar,et al.  Software quality assessment using a multi-strategy classifier , 2014, Inf. Sci..

[34]  Tong-Seng Quah,et al.  Application of neural networks for software quality prediction using object-oriented metrics , 2005, J. Syst. Softw..

[35]  Sandeep Kumar,et al.  Linear and non-linear heterogeneous ensemble methods to predict the number of faults in software systems , 2017, Knowl. Based Syst..

[36]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[37]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[38]  Banu Diri,et al.  A systematic review of software fault prediction studies , 2009, Expert Syst. Appl..

[39]  Thomas W. Rauber,et al.  Heterogeneous Feature Models and Feature Selection Applied to Bearing Fault Diagnosis , 2015, IEEE Transactions on Industrial Electronics.

[40]  Diana-Lucia Miholca,et al.  A novel approach for software defect prediction through hybridizing gradual relational association rules with artificial neural networks , 2018, Inf. Sci..

[41]  Ruchika Malhotra,et al.  Comparative analysis of statistical and machine learning methods for predicting faulty modules , 2014, Appl. Soft Comput..

[42]  Domenico Cotroneo,et al.  Predicting aging-related bugs using software complexity metrics , 2013, Perform. Evaluation.

[43]  Audris Mockus,et al.  Towards building a universal defect prediction model with rank transformed predictors , 2016, Empirical Software Engineering.

[44]  Taghi M. Khoshgoftaar,et al.  Software Quality Classification Modeling Using the SPRINT Decision Tree Algorithm , 2003, Int. J. Artif. Intell. Tools.

[45]  Tong-Seng Quah,et al.  Application of neural networks for software quality prediction using object-oriented metrics , 2003, International Conference on Software Maintenance, 2003. ICSM 2003. Proceedings..

[46]  James M. Hogan,et al.  Predicting Fault-Prone Software Modules with Rank Sum Classification , 2013, 2013 22nd Australian Software Engineering Conference.

[47]  Fred W. Glover,et al.  Future paths for integer programming and links to artificial intelligence , 1986, Comput. Oper. Res..

[48]  Sandeep Kumar,et al.  A decision tree logic based recommendation system to select software fault prediction techniques , 2017, Computing.

[49]  Hongfang Liu,et al.  Theory of relative defect proneness , 2008, Empirical Software Engineering.

[50]  Cong Jin,et al.  Prediction approach of software fault-proneness based on hybrid artificial neural network and quantum particle swarm optimization , 2015, Appl. Soft Comput..

[51]  Domenico Cotroneo,et al.  Analysis and Prediction of Mandelbugs in an Industrial Software System , 2013, 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation.

[52]  Xiang Chen,et al.  A Two-Stage Data Preprocessing Approach for Software Fault Prediction , 2014, 2014 Eighth International Conference on Software Security and Reliability.

[53]  Sandeep Kumar,et al.  Towards an ensemble based system for predicting the number of software faults , 2017, Expert Syst. Appl..

[54]  Cheng-Lung Huang,et al.  A GA-based feature selection and parameters optimizationfor support vector machines , 2006, Expert Syst. Appl..

[55]  Russell C. Eberhart,et al.  A discrete binary version of the particle swarm algorithm , 1997, 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation.

[56]  Marco Dorigo,et al.  Distributed Optimization by Ant Colonies , 1992 .

[57]  Khaled El Emam,et al.  Comparing case-based reasoning classifiers for predicting high risk software components , 2001, J. Syst. Softw..

[58]  Kathryn A. Dowsland,et al.  Simulated Annealing , 1989, Encyclopedia of GIS.

[59]  Yoshua Bengio,et al.  Drawing and Recognizing Chinese Characters with Recurrent Neural Network , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[60]  Richard Torkar,et al.  Software fault prediction metrics: A systematic literature review , 2013, Inf. Softw. Technol..

[61]  Zsuzsanna Marian,et al.  Software defect prediction using relational association rule mining , 2014, Inf. Sci..

[62]  Bart Baesens,et al.  Toward Comprehensible Software Fault Prediction Models Using Bayesian Network Classifiers , 2013, IEEE Transactions on Software Engineering.

[63]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[64]  Adam A. Porter,et al.  Empirically guided software development using metric-based classification trees , 1990, IEEE Software.

[65]  The application of ROC analysis in threshold identification, data imbalance and metrics selection for software fault prediction , 2017, Innovations in Systems and Software Engineering.

[66]  Pierre Alliez,et al.  Recurrent Neural Networks to Correct Satellite Image Classification Maps , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[67]  Marcelo Embiruçu,et al.  Fault Detection and Diagnosis in dynamic systems using Weightless Neural Networks , 2017, Expert Syst. Appl..

[68]  Cagatay Catal,et al.  Software fault prediction: A literature review and current trends , 2011, Expert Syst. Appl..

[69]  T. Funabashi,et al.  One-Hour-Ahead Load Forecasting Using Neural Networks , 2002 .