A novel Bayes defect predictor based on information diffusion function

Abstract Software defect prediction plays a significant part in identifying the most defect-prone modules before software testing. Quite a number of researchers have made great efforts to improve prediction accuracy. However, the problem of insufficient historical data available for within- or cross- project still remains unresolved. Further, it is common practice to use the probability density function for a normal distribution in Naive Bayes (NB) classifier. Nevertheless, after performing a Kolmogorov–Smirnov test, we find that the 21 main software metrics are not normally distributed at the 5% significance level. Therefore, this paper proposes a new Bayes classifier, which evolves NB classifier with non-normal information diffusion function, to help solve the problem of lacking appropriate training data for new projects. We conduct three experiments on 34 data sets obtained from 10 open source projects, using only 10%, 6.67%, 5%, 3.33% and 2% of the total data for training, respectively. Four well-known classification algorithms are also included for comparison, namely Logistic Regression, Naive Bayes, Random Tree and Support Vector Machine. All experimental results demonstrate the efficiency and practicability of the new classifier.

[1]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[2]  Audris Mockus,et al.  Towards building a universal defect prediction model with rank transformed predictors , 2016, Empirical Software Engineering.

[3]  Ying Zou,et al.  Cross-Project Defect Prediction Using a Connectivity-Based Unsupervised Classifier , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[4]  Lech Madeyski,et al.  Towards identifying software project clusters with regard to defect prediction , 2010, PROMISE '10.

[5]  Tim Menzies,et al.  Heterogeneous Defect Prediction , 2018, IEEE Trans. Software Eng..

[6]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[7]  Cong Jin,et al.  Applications of Support Vector Mathine and Unsupervised Learning for Predicting Maintainability Using Object-Oriented Metrics , 2010, 2010 Second International Conference on Multimedia and Information Technology.

[8]  Naoyasu Ubayashi,et al.  Studying just-in-time defect prediction using cross-project models , 2015, Empirical Software Engineering.

[9]  Irina Rish,et al.  An empirical study of the naive Bayes classifier , 2001 .

[10]  Jie Lu,et al.  Bayesian Nonparametric Relational Topic Model through Dependent Gamma Processes , 2017, IEEE Transactions on Knowledge and Data Engineering.

[11]  Ye Yang,et al.  An investigation on the feasibility of cross-project defect prediction , 2012, Automated Software Engineering.

[12]  Tian Jiang,et al.  Personalized defect prediction , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[13]  Ren Zhang,et al.  A new information diffusion modelling technique based on vibrating string equation and its application in natural disaster risk assessment , 2015, Int. J. Gen. Syst..

[14]  Rebecca N. Wright,et al.  A Practical Differentially Private Random Decision Tree Classifier , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[15]  Lionel C. Briand,et al.  A systematic and comprehensive investigation of methods to build and evaluate fault prediction models , 2010, J. Syst. Softw..

[16]  Huang Chong-fu,et al.  Principle of information diffusion , 1997 .

[17]  Qinbao Song,et al.  Software defect association mining and defect correction effort prediction , 2006 .

[18]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[19]  Thomas J. Ostrand,et al.  \{PROMISE\} Repository of empirical software engineering data , 2007 .

[20]  Ayse Basar Bener,et al.  On the relative value of cross-company and within-company data for defect prediction , 2009, Empirical Software Engineering.

[21]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[22]  Tao Wang,et al.  Naive Bayes Software Defect Prediction Model , 2010, 2010 International Conference on Computational Intelligence and Software Engineering.

[23]  David A. Landgrebe,et al.  A survey of decision tree classifier methodology , 1991, IEEE Trans. Syst. Man Cybern..

[24]  Shane McIntosh,et al.  An Empirical Comparison of Model Validation Techniques for Defect Prediction Models , 2017, IEEE Transactions on Software Engineering.

[25]  Ayse Basar Bener,et al.  Defect prediction from static code features: current results, limitations, new approaches , 2010, Automated Software Engineering.

[26]  Banu Diri,et al.  Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem , 2009, Inf. Sci..

[27]  Chris Clifton,et al.  Privacy-preserving Naïve Bayes classification , 2008, The VLDB Journal.

[28]  Harald C. Gall,et al.  Cross-project defect prediction: a large scale experiment on data vs. domain vs. process , 2009, ESEC/SIGSOFT FSE.

[29]  Marian Jureczko,et al.  Using Object-Oriented Design Metrics to Predict Software Defects 1* , 2010 .

[30]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[31]  Tim Menzies,et al.  Better cross company defect prediction , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[32]  H. Lilliefors On the Kolmogorov-Smirnov Test for Normality with Mean and Variance Unknown , 1967 .

[33]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007 .

[34]  Ayse Basar Bener,et al.  Empirical evaluation of the effects of mixed project data on learning defect predictors , 2013, Inf. Softw. Technol..

[35]  Guangchun Luo,et al.  Transfer learning for cross-company software defect prediction , 2012, Inf. Softw. Technol..

[36]  Sinno Jialin Pan,et al.  Transfer defect learning , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[37]  Xiao Liu,et al.  An empirical study on software defect prediction with a simplified metric set , 2014, Inf. Softw. Technol..

[38]  Steffen Herbold,et al.  Training data selection for cross-project defect prediction , 2013, PROMISE.