A feature dependent Naive Bayes approach and its application to the software defect prediction problem

Abstract Naive Bayes is one of the most widely used algorithms in classification problems because of its simplicity, effectiveness, and robustness. It is suitable for many learning scenarios, such as image classification, fraud detection, web mining, and text classification. Naive Bayes is a probabilistic approach based on assumptions that features are independent of each other and that their weights are equally important. However, in practice, features may be interrelated. In that case, such assumptions may cause a dramatic decrease in performance. In this study, by following preprocessing steps, a Feature Dependent Naive Bayes (FDNB) classification method is proposed. Features are included for calculation as pairs to create dependence between one another. This method was applied to the software defect prediction problem and experiments were carried out using widely recognized NASA PROMISE data sets. The obtained results show that this new method is more successful than the standard Naive Bayes approach and that it has a competitive performance with other feature-weighting techniques. A further aim of this study is to demonstrate that to be reliable, a learning model must be constructed by using only training data, as otherwise misleading results arise from the use of the entire data set.

[1]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007 .

[2]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[3]  Ömer Faruk Arar,et al.  Deriving thresholds of software metrics to predict faults on open source software: Replicated case studies , 2016, Expert Syst. Appl..

[4]  Takahiro Hara,et al.  Wikipedia-Based Semantic Similarity Measurements for Noisy Short Texts Using Extended Naive Bayes , 2015, IEEE Transactions on Emerging Topics in Computing.

[5]  Harry Zhang,et al.  Learning weighted naive Bayes with accurate ranking , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[6]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[7]  Zhihua Cui,et al.  A model for software defect prediction using support vector machine based on CBA , 2016, Int. J. Intell. Syst. Technol. Appl..

[8]  Anju Saha,et al.  Open Issues in Software Defect Prediction , 2015 .

[9]  Maurice H. Halstead,et al.  Elements of software science , 1977 .

[10]  Tracy Hall,et al.  A Systematic Literature Review on Fault Prediction Performance in Software Engineering , 2012, IEEE Transactions on Software Engineering.

[11]  Jun Zhang,et al.  Internet Traffic Classification by Aggregating Correlated Naive Bayes Predictions , 2023, IEEE Transactions on Information Forensics and Security.

[12]  Yu Bai,et al.  Evolutionary lazy learning for Naive Bayes classification , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[13]  Geoffrey I. Webb,et al.  Incremental Discretization for Naïve-Bayes Classifier , 2006, ADMA.

[14]  Dae-Ki Kang,et al.  Experimental analysis of naïve Bayes classifier based on an attribute weighting framework with smooth kernel density estimations , 2015, Applied Intelligence.

[15]  Tibor Gyimóthy,et al.  Empirical validation of object-oriented metrics on open source software for fault prediction , 2005, IEEE Transactions on Software Engineering.

[16]  Zhuming Bi,et al.  Feature weighted naïve Bayes algorithm for information retrieval of enterprise systems , 2014, Enterp. Inf. Syst..

[17]  Bo Tang,et al.  A Bayesian Classification Approach Using Class-Specific Features for Text Categorization , 2016, IEEE Transactions on Knowledge and Data Engineering.

[18]  Ayse Basar Bener,et al.  Analysis of Naive Bayes' assumptions on software fault data: An empirical study , 2009, Data Knowl. Eng..

[19]  Ömer Faruk Arar,et al.  Software defect prediction using cost-sensitive neural network , 2015, Appl. Soft Comput..

[20]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[21]  Geoff Holmes,et al.  Benchmarking Attribute Selection Techniques for Discrete Class Data Mining , 2003, IEEE Trans. Knowl. Data Eng..

[22]  Vincenzo Conti,et al.  A Novel Technique for Fingerprint Classification Based on Fuzzy C-Means and Naive Bayes Classifier , 2014, 2014 Eighth International Conference on Complex, Intelligent and Software Intensive Systems.

[23]  Rizky Tri Asmono,et al.  Absolute Correlation Weighted Naïve Bayes for Software Defect Prediction , 2015 .

[24]  Shomona Gracia Jacob,et al.  Improved Random Forest Algorithm for Software Defect Prediction through Data Mining Techniques , 2015 .

[25]  Huan Liu,et al.  Discretization: An Enabling Technique , 2002, Data Mining and Knowledge Discovery.

[26]  Ruchika Malhotra,et al.  Fault prediction considering threshold effects of object‐oriented metrics , 2015, Expert Syst. J. Knowl. Eng..

[27]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[28]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[29]  Geoffrey I. Webb,et al.  Alleviating naive Bayes attribute independence assumption by attribute weighting , 2013, J. Mach. Learn. Res..

[30]  Ruchika Malhotra,et al.  Software defect prediction using neural networks , 2014, Proceedings of 3rd International Conference on Reliability, Infocom Technologies and Optimization.

[31]  Arvinder Kaur,et al.  Empirical validation of object-oriented metrics for predicting fault proneness models , 2010, Software Quality Journal.

[32]  Mohammed El Amine Bechar,et al.  Statistical Comparisons of the Top 10 Algorithms in Data Mining for Classification Task , 2016 .

[33]  Ruchika Malhotra,et al.  A systematic review of machine learning techniques for software fault prediction , 2015, Appl. Soft Comput..

[34]  Xin Yao,et al.  Using Class Imbalance Learning for Software Defect Prediction , 2013, IEEE Transactions on Reliability.

[35]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[36]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[37]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[38]  Musa Mammadov,et al.  Attribute weighted Naive Bayes classifier using a local optimization , 2014, Neural Computing and Applications.

[39]  Lloyd A. Smith,et al.  Feature Selection for Machine Learning: Comparing a Correlation-Based Filter Approach to the Wrapper , 1999, FLAIRS.

[40]  Shasha Wang,et al.  Deep feature weighting for naive Bayes and its application to text classification , 2016, Eng. Appl. Artif. Intell..

[41]  Jie Lin,et al.  Weighted Naive Bayes classification algorithm based on particle swarm optimization , 2011, 2011 IEEE 3rd International Conference on Communication Software and Networks.

[42]  Victor R. Basili,et al.  A Validation of Object-Oriented Design Metrics as Quality Indicators , 1996, IEEE Trans. Software Eng..

[43]  Yong Shi,et al.  Improve the Prediction Accuracy of Naïve Bayes Classifier with Association Rule Mining , 2016, 2016 IEEE 2nd International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing (HPSC), and IEEE International Conference on Intelligent Data and Security (IDS).

[44]  S. Dick,et al.  Applying Novel Resampling Strategies To Software Defect Prediction , 2007, NAFIPS 2007 - 2007 Annual Meeting of the North American Fuzzy Information Processing Society.

[45]  Rattikorn Hewett,et al.  Mining software defect data to support software testing management , 2011, Applied Intelligence.

[46]  Mark A. Hall,et al.  A decision tree-based attribute weighting filter for naive Bayes , 2006, Knowl. Based Syst..

[47]  Geoffrey I. Webb,et al.  Discretization for naive-Bayes learning: managing discretization bias and variance , 2008, Machine Learning.

[48]  Romi Satria Wahono,et al.  A Systematic Literature Review of Software Defect Prediction: Research Trends, Datasets, Methods and Frameworks , 2015 .

[49]  Qinbao Song,et al.  A General Software Defect-Proneness Prediction Framework , 2011, IEEE Transactions on Software Engineering.

[50]  H WittenIan,et al.  The WEKA data mining software , 2009 .

[51]  Yong Ma,et al.  The Approach to Detect Abnormal Access Behavior Based on Naive Bayes Algorithm , 2016, 2016 10th International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing (IMIS).

[52]  Xuesong Yan,et al.  Survey of Improving Naive Bayes for Classification , 2007, ADMA.

[53]  Jongmoon Baik,et al.  Effective multi-objective naïve Bayes learning for cross-project defect prediction , 2016, Appl. Soft Comput..

[54]  Geoffrey I. Webb,et al.  A comparative study of Semi-naive Bayes methods in classification learning , 2005 .

[55]  Geoffrey I. Webb,et al.  Lazy Learning of Bayesian Rules , 2000, Machine Learning.

[56]  Usama M. Fayyad,et al.  On the Handling of Continuous-Valued Attributes in Decision Tree Generation , 1992, Machine Learning.

[57]  Tzu-Tsung Wong,et al.  A hybrid discretization method for naïve Bayesian classifiers , 2012, Pattern Recognit..

[58]  Pradeep Singh,et al.  An Investigation of the Effect of Discretization on Defect Prediction Using Static Measures , 2009, 2009 International Conference on Advances in Computing, Control, and Telecommunication Technologies.

[59]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[60]  Xiao Liu,et al.  An empirical study on software defect prediction with a simplified metric set , 2014, Inf. Softw. Technol..