Enhanced Binary Moth Flame Optimization as a Feature Selection Algorithm to Predict Software Fault Prediction

Software fault prediction (SFP) is a complex problem that meets developers in the software development life cycle. Collecting data from real software projects, either while the development life cycle or after lunch the product, is not a simple task, and the collected data may suffer from imbalance data distribution problem. In this research, we proposed an Enhanced Binary Moth Flame Optimization (EBMFO) with Adaptive synthetic sampling (ADASYN) to predict software faults. BMFO is employed as a wrapper feature selection, while ADASYN enhances the input dataset and address the imbalanced dataset. Converting MFO algorithm from a continues version to the binary version using transfer functions (TFs) from two different groups (S-shape and V-shape) is investigated in this work and proposed an EBFMFO version. Fifteen real projects data obtained from PROMISE repository are employed in this work. Three different classifiers are used: the k-nearest neighbors (k-NN), Decision Trees (DT), and Linear discriminant analysis (LDA). The reported results demonstrate that the proposed EBMFO enhances the overall performance of classifiers and outperforms the results in the literature and show the importance of TF for feature selection algorithms.

[1]  Seyed Mohammad Mirjalili,et al.  Evolutionary population dynamics and grey wolf optimizer , 2015, Neural Computing and Applications.

[2]  Kuldeep Kumar,et al.  Empirical analysis of change metrics for software fault prediction , 2018, Comput. Electr. Eng..

[3]  David W. Aha,et al.  Simplifying decision trees: A survey , 1997, The Knowledge Engineering Review.

[4]  Lech Madeyski,et al.  Towards identifying software project clusters with regard to defect prediction , 2010, PROMISE '10.

[5]  R. Boggia,et al.  Genetic algorithms as a strategy for feature selection , 1992 .

[6]  Taghi M. Khoshgoftaar,et al.  Software quality assessment using a multi-strategy classifier , 2014, Inf. Sci..

[7]  Tong-Seng Quah,et al.  Application of neural networks for software quality prediction using object-oriented metrics , 2005, J. Syst. Softw..

[8]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[9]  Thomas Zimmermann,et al.  Predicting Bugs from History , 2008, Software Evolution.

[10]  Thomas W. Rauber,et al.  Heterogeneous Feature Models and Feature Selection Applied to Bearing Fault Diagnosis , 2015, IEEE Transactions on Industrial Electronics.

[11]  Justin C. W. Debuse,et al.  Feature Subset Selection within a Simulated Annealing Data Mining Algorithm , 1997, Journal of Intelligent Information Systems.

[12]  John Grundy,et al.  Systematic literature reviews in agile software development: A tertiary study , 2017, Inf. Softw. Technol..

[13]  The application of ROC analysis in threshold identification, data imbalance and metrics selection for software fault prediction , 2017, Innovations in Systems and Software Engineering.

[14]  Miroslaw Malek,et al.  Survey of software tools for evaluating reliability, availability, and serviceability , 1988, CSUR.

[15]  Xiangyang Wang,et al.  Feature selection based on rough sets and particle swarm optimization , 2007, Pattern Recognit. Lett..

[16]  James M. Hogan,et al.  Predicting Fault-Prone Software Modules with Rank Sum Classification , 2013, 2013 22nd Australian Software Engineering Conference.

[17]  Xiaofeng Zhu,et al.  Efficient kNN Classification With Different Numbers of Nearest Neighbors , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[18]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[19]  Salwani Abdullah,et al.  Fuzzy Modified Great Deluge Algorithm for Attribute Reduction , 2014, SCDM.

[20]  Taghi M. Khoshgoftaar,et al.  Software Quality Classification Modeling Using the SPRINT Decision Tree Algorithm , 2003, Int. J. Artif. Intell. Tools.

[21]  Michael R. Lyu,et al.  A novel method for early software quality prediction based on support vector machine , 2005, 16th IEEE International Symposium on Software Reliability Engineering (ISSRE'05).

[22]  Venkatesan Guruswami,et al.  Combinatorial feature selection problems , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[23]  Xiaodong Li,et al.  Iterated feature selection algorithms with layered recurrent neural network for software fault prediction , 2019, Expert Syst. Appl..

[24]  Ebru Akcapinar Sezer,et al.  Iterative software fault prediction with a hybrid approach , 2016, Appl. Soft Comput..

[25]  Tracy Hall,et al.  A Systematic Literature Review on Fault Prediction Performance in Software Engineering , 2012, IEEE Transactions on Software Engineering.

[26]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[27]  Seyed Mohammad Mirjalili,et al.  Moth-flame optimization algorithm: A novel nature-inspired heuristic paradigm , 2015, Knowl. Based Syst..

[28]  Hossam Faris,et al.  Evolutionary Population Dynamics and Grasshopper Optimization approaches for feature selection problems , 2017, Knowl. Based Syst..

[29]  Ye Xia,et al.  A Study on the Significance of Software Metrics in Defect Prediction , 2013, 2013 Sixth International Symposium on Computational Intelligence and Design.

[30]  Sandeep Kumar,et al.  Towards an ensemble based system for predicting the number of software faults , 2017, Expert Syst. Appl..

[31]  Taghi M. Khoshgoftaar,et al.  An application of fuzzy clustering to software quality prediction , 2000, Proceedings 3rd IEEE Symposium on Application-Specific Systems and Software Engineering Technology.

[32]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[33]  Sandeep Kumar,et al.  A decision tree logic based recommendation system to select software fault prediction techniques , 2017, Computing.

[34]  Hongbin Zhang,et al.  Feature selection using tabu search method , 2002, Pattern Recognit..

[35]  Majdi M. Mafarja,et al.  Hybrid Whale Optimization Algorithm with simulated annealing for feature selection , 2017, Neurocomputing.

[36]  Selma Ayse Özel,et al.  A hybrid approach of differential evolution and artificial bee colony for feature selection , 2016, Expert Syst. Appl..

[37]  Mineichi Kudo,et al.  Comparison of algorithms that select features for pattern classifiers , 2000, Pattern Recognit..

[38]  Russell C. Eberhart,et al.  A discrete binary version of the particle swarm algorithm , 1997, 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation.

[39]  Khaled El Emam,et al.  Comparing case-based reasoning classifiers for predicting high risk software components , 2001, J. Syst. Softw..

[40]  Rupam Kumari,et al.  Software Fault Prediction using Machine Learning Techniques , 2018 .

[41]  Andrew Lewis,et al.  S-shaped versus V-shaped transfer functions for binary Particle Swarm Optimization , 2013, Swarm Evol. Comput..

[42]  Ruchika Malhotra,et al.  Comparative analysis of statistical and machine learning methods for predicting faulty modules , 2014, Appl. Soft Comput..

[43]  Domenico Cotroneo,et al.  Analysis and Prediction of Mandelbugs in an Industrial Software System , 2013, 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation.

[44]  Hossam Faris,et al.  Harris hawks optimization: Algorithm and applications , 2019, Future Gener. Comput. Syst..

[45]  Duoqian Miao,et al.  A rough set approach to feature selection based on ant colony optimization , 2010, Pattern Recognit. Lett..

[46]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[47]  Eid Emary,et al.  Feature selection approach based on moth-flame optimization algorithm , 2016, 2016 IEEE Congress on Evolutionary Computation (CEC).

[48]  Alexandros Iosifidis,et al.  Weighted Linear Discriminant Analysis Based on Class Saliency Information , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[49]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[50]  Adam A. Porter,et al.  Empirically guided software development using metric-based classification trees , 1990, IEEE Software.

[51]  Hossein Nezamabadi-pour,et al.  BGSA: binary gravitational search algorithm , 2010, Natural Computing.

[52]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..