A Combined-Learning Based Framework for Improved Software Fault Prediction

Software Fault Prediction (SFP) is found to be vital to predict the fault-proneness of software modules, which allows software engineers to focus development activities on fault-prone modules, thereby prioritize and optimize tests, improve software quality and make better use of resources. In this regard, machine learning has been successfully applied to solve classification problems for SFP. Nevertheless, the presence of different software metrics, the redundant and irrelevant features and the imbalanced nature of software datasets have created more and more challenges for the classification problems. Therefore, the objective of this study is to independently examine software metrics with multiple Feature Selection (FS) combined with Data Balancing (DB) using Synthetic Minority Oversampling Techniques for improving classification performance. Accordingly, a new framework that efficiently handles those challenges in a combined form on both Object Oriented Metrics (OOM) and Static Code Metrics (SCM) datasets is proposed. The experimental results confirm that the prediction performance could be compromised without suitable Feature Selection Techniques (FST). To mitigate that, data must be balanced. Thus our combined technique assures the robust performance. Furthermore, a combination of Random Forts (RF) with Information Gain (IG) FS yields the highest Receiver Operating Characteristic (ROC) curve (0.993) value, which is found to be the best combination when SCM are used, whereas the combination of RF with Correlation-based Feature Selection (CFS) guarantees the highest ROC (0.909) value, which is found to be the best choice when OOM are used. Therefore, as shown in this study, software metrics used to predict the fault proneness of the software modules must be carefully examined and suitable FST for software metrics must be cautiously selected. Moreover, DB must be applied in order to obtain robust performance. In addition to that, dealing with the challenges mentioned above, the proposed framework ensures the remarkable classification performance and lays the pathway to quality assurance of software.

[1]  Moataz A. Ahmed,et al.  Machine Learning-Based Software Quality Prediction Models: State of the Art , 2013, 2013 International Conference on Information Science and Applications (ICISA).

[2]  Richard Torkar,et al.  Software fault prediction metrics: A systematic literature review , 2013, Inf. Softw. Technol..

[3]  Qing Li,et al.  Three-way decisions based software defect prediction , 2016, Knowl. Based Syst..

[4]  Xiao Liu,et al.  An empirical study on software defect prediction with a simplified metric set , 2014, Inf. Softw. Technol..

[5]  Ali Selamat,et al.  A survey on software fault detection based on different prediction approaches , 2014, Vietnam Journal of Computer Science.

[6]  Iker Gondra,et al.  Applying machine learning to software fault-proneness prediction , 2008, J. Syst. Softw..

[7]  Robert X. Gao,et al.  PCA-based feature selection scheme for machine defect classification , 2004, IEEE Transactions on Instrumentation and Measurement.

[8]  José Salvador Sánchez,et al.  On the effectiveness of preprocessing methods when dealing with different levels of class imbalance , 2012, Knowl. Based Syst..

[9]  Yi Zhang,et al.  Classifying Software Changes: Clean or Buggy? , 2008, IEEE Transactions on Software Engineering.

[10]  Michele Lanza,et al.  Evaluating defect prediction approaches: a benchmark and an extensive comparison , 2011, Empirical Software Engineering.

[11]  Kapil Sharma,et al.  Optimizing testing efforts based on change proneness through machine learning techniques , 2014 .

[12]  Lionel C. Briand,et al.  A systematic and comprehensive investigation of methods to build and evaluate fault prediction models , 2010, J. Syst. Softw..

[13]  Cagatay Catal,et al.  Software fault prediction: A literature review and current trends , 2011, Expert Syst. Appl..

[14]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[15]  Chenggang Bai,et al.  Cross-Project Aging Related Bug Prediction , 2015, 2015 IEEE International Conference on Software Quality, Reliability and Security.

[16]  Tracy Hall,et al.  A Systematic Literature Review on Fault Prediction Performance in Software Engineering , 2012, IEEE Transactions on Software Engineering.

[17]  Xiang Chen,et al.  FECAR: A Feature Selection Framework for Software Defect Prediction , 2014, 2014 IEEE 38th Annual Computer Software and Applications Conference.

[18]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[19]  Hui Li,et al.  Application of Random-SMOTE on Imbalanced Data Mining , 2011, 2011 Fourth International Conference on Business Intelligence and Financial Engineering.

[20]  Taghi M. Khoshgoftaar,et al.  A Comparative Study of Ensemble Feature Selection Techniques for Software Defect Prediction , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[21]  Tim Menzies,et al.  Too much automation? The bellwether effect and its implications for transfer learning , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[22]  Choochart Haruechaiyasak,et al.  Improving emotion classification in imbalanced YouTube dataset using SMOTE algorithm , 2015, 2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA).

[23]  Sunghun Kim,et al.  Reducing Features to Improve Code Change-Based Bug Prediction , 2013, IEEE Transactions on Software Engineering.

[24]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[25]  Russel Pears,et al.  Synthetic Minority Over-sampling TEchnique (SMOTE) for Predicting Software Build Outcomes , 2014, SEKE.

[26]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[27]  Huan Liu,et al.  A selective sampling approach to active feature selection , 2004, Artif. Intell..

[28]  Tibor Gyimóthy,et al.  Empirical validation of object-oriented metrics on open source software for fault prediction , 2005, IEEE Transactions on Software Engineering.

[29]  Ruchika Malhotra,et al.  A systematic review of machine learning techniques for software fault prediction , 2015, Appl. Soft Comput..

[30]  Tim Menzies,et al.  Heterogeneous Defect Prediction , 2018, IEEE Trans. Software Eng..

[31]  Fengxi Song,et al.  Feature Selection Using Principal Component Analysis , 2010, 2010 International Conference on System Science, Engineering Design and Manufacturing Informatization.

[32]  Bart Baesens,et al.  Toward Comprehensible Software Fault Prediction Models Using Bayesian Network Classifiers , 2013, IEEE Transactions on Software Engineering.

[33]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[34]  Dilip Kumar Yadav,et al.  A fuzzy logic based approach for phase-wise software defects prediction using software metrics , 2015, Inf. Softw. Technol..

[35]  Mohammad Alshayeb,et al.  Software defect prediction using ensemble learning on selected features , 2015, Inf. Softw. Technol..

[36]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[37]  Karim O. Elish,et al.  Predicting defect-prone software modules using support vector machines , 2008, J. Syst. Softw..

[38]  Hesham Ahmed Hefny,et al.  Handling autism imbalanced data using synthetic minority over-sampling technique (SMOTE) , 2015, 2015 Third World Conference on Complex Systems (WCCS).

[39]  Taghi M. Khoshgoftaar,et al.  Learning with limited minority class data , 2007, ICMLA 2007.