Effective multi-objective naïve Bayes learning for cross-project defect prediction

Display Omitted We propose novel multi-objective learning techniques considering the class imbalance context for cross-project defect prediction.The proposed approaches (i.e., MONB and MONBNN) show the better diversity compared to existing multi-objective prediction models.The proposed approaches show the similar prediction performance compared to within-project defect prediction models. Software defect prediction predicts fault-prone modules which will be tested thoroughly. Thereby, limited quality control resources can be allocated effectively on them. Without sufficient local data, defects can be predicted via cross-project defect prediction (CPDP) utilizing data from other projects to build a classifier. Software defect datasets have the class imbalance problem, indicating the defect class has much fewer instances than the non-defect class does. Unless defect instances are predicted correctly, software quality could be degraded. In this context, a classifier requires to provide high accuracy of the defect class without severely worsening the accuracy of the non-defect class. This class imbalance principle seamlessly connects to the purpose of the multi-objective (MO) optimization in that MO predictive models aim at balancing many of the competing objectives. In this paper, we target to identify effective multi-objective learning techniques under cross-project (CP) environments. Three objectives are devised considering the class imbalance context. The first objective is to maximize the probability of detection (PD). The second objective is to minimize the probability of false alarm (PF). The third objective is to maximize the overall performance (e.g., Balance). We propose novel MO naive Bayes learning techniques modeled by a Harmony Search meta-heuristic algorithm. Our approaches are compared with single-objective models, other existing MO models and within-project defect prediction models. The experimental results show that the proposed approaches are promising. As a result, they can be effectively applied to satisfy various prediction needs under CP settings.

[1]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[2]  Zong Woo Geem,et al.  State-of-the-Art in the Structure of Harmony Search Algorithm , 2010, Recent Advances In Harmony Search Algorithm.

[3]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007 .

[4]  Benjamín Barán,et al.  Multiobjective Harmony Search Algorithm Proposals , 2011, CLEI Selected Papers.

[5]  Michele Lanza,et al.  Evaluating defect prediction approaches: a benchmark and an extensive comparison , 2011, Empirical Software Engineering.

[6]  Lech Madeyski,et al.  Towards identifying software project clusters with regard to defect prediction , 2010, PROMISE '10.

[7]  Gary B. Lamont,et al.  Multiobjective Evolutionary Algorithms: Analyzing the State-of-the-Art , 2000, Evolutionary Computation.

[8]  Karim O. Elish,et al.  Predicting defect-prone software modules using support vector machines , 2008, J. Syst. Softw..

[9]  Harald C. Gall,et al.  Cross-project Defect Prediction , 2009 .

[10]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[11]  Gerardo Canfora,et al.  Defect prediction as a multiobjective optimization problem , 2015, Softw. Test. Verification Reliab..

[12]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[13]  Arvinder Kaur,et al.  Empirical validation of object-oriented metrics for predicting fault proneness models , 2010, Software Quality Journal.

[14]  Zong Woo Geem,et al.  A New Heuristic Optimization Algorithm: Harmony Search , 2001, Simul..

[15]  Z. Geem Optimal cost design of water distribution networks using harmony search , 2006 .

[16]  Ayse Basar Bener,et al.  On the relative value of cross-company and within-company data for defect prediction , 2009, Empirical Software Engineering.

[17]  Bart Baesens,et al.  Toward Comprehensible Software Fault Prediction Models Using Bayesian Network Classifiers , 2013, IEEE Transactions on Software Engineering.

[18]  Jongmoon Baik,et al.  Value-cognitive boosting with a support vector machine for cross-project defect prediction , 2014, Empirical Software Engineering.

[19]  Ye Yang,et al.  An investigation on the feasibility of cross-project defect prediction , 2012, Automated Software Engineering.

[20]  Carlos A. Coello Coello,et al.  A Short Tutorial on Evolutionary Multiobjective Optimization , 2001, EMO.

[21]  John A. Clark,et al.  Efficient Software Verification: Statistical Testing Using Automated Search , 2010, IEEE Transactions on Software Engineering.

[22]  Guangchun Luo,et al.  Transfer learning for cross-company software defect prediction , 2012, Inf. Softw. Technol..

[23]  Ayse Basar Bener,et al.  Defect prediction from static code features: current results, limitations, new approaches , 2010, Automated Software Engineering.

[24]  Peter J. Fleming,et al.  Genetic Algorithms for Multiobjective Optimization: FormulationDiscussion and Generalization , 1993, ICGA.

[25]  Taghi M. Khoshgoftaar,et al.  A Multi-Objective Software Quality Classification Model Using Genetic Programming , 2007, IEEE Transactions on Reliability.

[26]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[27]  Sinno Jialin Pan,et al.  Transfer defect learning , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[28]  Jongmoon Baik,et al.  A Hybrid Instance Selection Using Nearest-Neighbor for Cross-Project Defect Prediction , 2015, Journal of Computer Science and Technology.

[29]  Lionel C. Briand,et al.  A systematic and comprehensive investigation of methods to build and evaluate fault prediction models , 2010, J. Syst. Softw..

[30]  Jongmoon Baik,et al.  A transfer cost-sensitive boosting approach for cross-project defect prediction , 2017, Software Quality Journal.

[31]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[32]  Lionel C. Briand,et al.  A practical guide for using statistical tests to assess randomized algorithms in software engineering , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[33]  Harald C. Gall,et al.  Cross-project defect prediction: a large scale experiment on data vs. domain vs. process , 2009, ESEC/SIGSOFT FSE.

[34]  Mark Harman,et al.  The relationship between search based software engineering and predictive modeling , 2010, PROMISE '10.

[35]  Thomas J. Ostrand,et al.  \{PROMISE\} Repository of empirical software engineering data , 2007 .

[36]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[37]  Jie Lin,et al.  Weighted Naive Bayes classification algorithm based on particle swarm optimization , 2011, 2011 IEEE 3rd International Conference on Communication Software and Networks.

[38]  Ayse Basar Bener,et al.  Empirical evaluation of the effects of mixed project data on learning defect predictors , 2013, Inf. Softw. Technol..

[39]  A. Vargha,et al.  A Critique and Improvement of the CL Common Language Effect Size Statistics of McGraw and Wong , 2000 .

[40]  Tracy Hall,et al.  A Systematic Literature Review on Fault Prediction Performance in Software Engineering , 2012, IEEE Transactions on Software Engineering.