A novel software defect prediction based on atomic class-association rule mining

Abstract To ensure the rational allocation of software testing resources and reduce costs, software defect prediction has drawn notable attention to many “white-box” and “black-box” classification algorithms. Although there have been lots of studies on using software product metrics to identify defect-prone modules, defect prediction algorithms are still worth exploring. For instance, it is not easy to directly implement the Apriori algorithm to classify defect-prone modules across a skewed dataset. Therefore, we propose a novel supervised approach for software defect prediction based on atomic class-association rule mining (ACAR). It holds the characteristics of only one feature of the antecedent and a unique class label of the consequent, which is a specific kind of association rules that explores the relationship between attributes and categories. It holds the characteristics of only one feature of the antecedent and a unique class label of the consequent, which is a specific kind of association rules that explores the relationship between attributes and categories. Such association patterns can provide meaningful knowledge that can be easily understood by software engineers. A new software defect prediction model infrastructure based on association rules is employed to improve the prediction of defect-prone modules, which is divided into data preprocessing, rule model building and performance evaluation. Moreover, ACAR can achieve a satisfactory classification performance compared with other seven benchmark learners (the extension of classification based on associations (CBA2), Support Vector Machine, Naive Bayesian, Decision Tree, OneR, K-nearest Neighbors and RIPPER) on NASA MDP and PROMISE datasets. In light of software defect associative prediction, a comparative experiment between ACAR and CBA2 is discussed in details. It is demonstrated that ACAR is better than CBA2 in terms of AUC, G-mean, Balance, and understandability. In addition, the average AUC of ACAR is increased by 2.9% compared with CBA2, which can reach 81.1%.

[1]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[2]  Qinbao Song,et al.  A General Software Defect-Proneness Prediction Framework , 2011, IEEE Transactions on Software Engineering.

[3]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[4]  Ömer Faruk Arar,et al.  Software defect prediction using cost-sensitive neural network , 2015, Appl. Soft Comput..

[5]  Bruce Christianson,et al.  The misuse of the NASA metrics data program data sets for automated software defect prediction , 2011, EASE.

[6]  Qinbao Song,et al.  Software defect association mining and defect correction effort prediction , 2006 .

[7]  Sajjad Mahmood,et al.  A survey of component based system quality assurance and assessment , 2005, Inf. Softw. Technol..

[8]  Xing Zhang,et al.  A new approach to classification based on association rule mining , 2006, Decis. Support Syst..

[9]  Ömer Faruk Arar,et al.  A feature dependent Naive Bayes approach and its application to the software defect prediction problem , 2017, Appl. Soft Comput..

[10]  Jesús S. Aguilar-Ruiz,et al.  Detecting Fault Modules Applying Feature Selection to Classifiers , 2007, 2007 IEEE International Conference on Information Reuse and Integration.

[11]  Qinbao Song,et al.  A novel feature subset selection algorithm based on association rule mining , 2013, Intell. Data Anal..

[12]  Gerardo Canfora,et al.  Defect prediction as a multiobjective optimization problem , 2015, Softw. Test. Verification Reliab..

[13]  Guangchun Luo,et al.  Transfer learning for cross-company software defect prediction , 2012, Inf. Softw. Technol..

[14]  Yue Jiang,et al.  Comparing design and code metrics for software quality prediction , 2008, PROMISE '08.

[15]  Yue Jiang,et al.  Fault Prediction using Early Lifecycle Data , 2007, The 18th IEEE International Symposium on Software Reliability (ISSRE '07).

[16]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[17]  Bart Baesens,et al.  Mining software repositories for comprehensible software fault prediction models , 2008, J. Syst. Softw..

[18]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[19]  Chih-Ping Chu,et al.  Defect prevention in software processes: An action-based approach , 2007, J. Syst. Softw..

[20]  Peter I. Cowling,et al.  Improving rule sorting, predictive accuracy and training time in associative classification , 2006, Expert Syst. Appl..

[21]  Bart Baesens,et al.  Investigating Associative Classification for Software Fault Prediction: An Experimental Perspective , 2014, Int. J. Softw. Eng. Knowl. Eng..

[22]  Ingunn Myrtveit,et al.  Reliability and validity in comparative studies of software prediction models , 2005, IEEE Transactions on Software Engineering.

[23]  Francisco Herrera,et al.  An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics , 2013, Inf. Sci..

[24]  Ali Selamat,et al.  A survey on software fault detection based on different prediction approaches , 2014, Vietnam Journal of Computer Science.

[25]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[26]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[27]  Sylvain Arlot,et al.  A survey of cross-validation procedures for model selection , 2009, 0907.4728.

[28]  Chad Creighton,et al.  Mining gene expression databases for association rules , 2003, Bioinform..

[29]  Xin Yao,et al.  Using Class Imbalance Learning for Software Defect Prediction , 2013, IEEE Transactions on Reliability.

[30]  Shomona Gracia Jacob,et al.  Improved Random Forest Algorithm for Software Defect Prediction through Data Mining Techniques , 2015 .

[31]  Taghi M. Khoshgoftaar,et al.  Choosing software metrics for defect prediction: an investigation on feature selection techniques , 2011, Softw. Pract. Exp..

[32]  Ke Wang,et al.  Mining Customer Value: From Association Rules to Direct Marketing , 2005, Data Mining and Knowledge Discovery.

[33]  Ayse Basar Bener,et al.  Analysis of Naive Bayes' assumptions on software fault data: An empirical study , 2009, Data Knowl. Eng..

[34]  Mian M. Awais,et al.  Improving Recall of software defect prediction models using association mining , 2015, Knowl. Based Syst..

[35]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[36]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[37]  João Paulo Papa,et al.  How Far do We Get Using Machine Learning Black-Boxes? , 2012, Int. J. Pattern Recognit. Artif. Intell..

[38]  Jesús S. Aguilar-Ruiz,et al.  Attribute Selection in Software Engineering Datasets for Detecting Fault Modules , 2007, 33rd EUROMICRO Conference on Software Engineering and Advanced Applications (EUROMICRO 2007).

[39]  Yiming Ma,et al.  Improving an Association Rule Based Classifier , 2000, PKDD.

[40]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[41]  Elena Baralis,et al.  A lazy approach to pruning classification rules , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[42]  M. Cevdet Ince,et al.  An expert system for detection of breast cancer based on association rules and neural network , 2009, Expert Syst. Appl..

[43]  Banu Diri,et al.  A systematic review of software fault prediction studies , 2009, Expert Syst. Appl..

[44]  Tracy Hall,et al.  A Systematic Literature Review on Fault Prediction Performance in Software Engineering , 2012, IEEE Transactions on Software Engineering.

[45]  Nikhil R. Pal,et al.  Fuzzy Rule-Based Approach for Software Fault Prediction , 2017, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[46]  Romi Satria Wahono,et al.  A Systematic Literature Review of Software Defect Prediction: Research Trends, Datasets, Methods and Frameworks , 2015 .

[47]  Karim O. Elish,et al.  Predicting defect-prone software modules using support vector machines , 2008, J. Syst. Softw..

[48]  Jaber Alwidian,et al.  WCBA: Weighted classification based on association rules algorithm for breast cancer disease , 2018, Appl. Soft Comput..

[49]  Jun Wang,et al.  Compressed C4.5 Models for Software Defect Prediction , 2012, 2012 12th International Conference on Quality Software.

[50]  Stan Matwin,et al.  Machine Learning Method for Software Quality Model Building , 1999, ISMIS.

[51]  Bruce Christianson,et al.  Reflections on the NASA MDP data sets , 2012, IET Softw..

[52]  Shyue-Liang Wang,et al.  An Empirical Case Study of Internet Usage on Student Performance based on Fuzzy Association Rules , 2016, MISNC.

[53]  K. Goseva-Popstojanova,et al.  Common Trends in Software Fault and Failure Data , 2009, IEEE Transactions on Software Engineering.

[54]  Xiang Chen,et al.  A Two-Stage Data Preprocessing Approach for Software Fault Prediction , 2014, 2014 Eighth International Conference on Software Security and Reliability.

[55]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[56]  Ayse Basar Bener,et al.  Defect prediction from static code features: current results, limitations, new approaches , 2010, Automated Software Engineering.

[57]  Banu Diri,et al.  Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem , 2009, Inf. Sci..

[58]  Chih-Ping Chu,et al.  Integrating in-process software defect prediction with association mining to discover defect pattern , 2009, Inf. Softw. Technol..

[59]  Barry Boehm,et al.  Top 10 list [software development] , 2001 .

[60]  David M. W. Powers,et al.  Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[61]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[62]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007 .

[63]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[64]  Shujuan Jiang,et al.  The Performance Stability of Defect Prediction Models with Class Imbalance: An Empirical Study , 2017, IEICE Trans. Inf. Syst..

[65]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[66]  Jian Pei,et al.  CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[67]  Venkata U. B. Challagulla,et al.  Empirical assessment of machine learning based software defect prediction techniques , 2005, 10th IEEE International Workshop on Object-Oriented Real-Time Dependable Systems.

[68]  Taghi M. Khoshgoftaar,et al.  An application of zero-inflated Poisson regression for software fault prediction , 2001, Proceedings 12th International Symposium on Software Reliability Engineering.

[69]  Geoff Holmes,et al.  Benchmarking Attribute Selection Techniques for Discrete Class Data Mining , 2003, IEEE Trans. Knowl. Data Eng..

[70]  Djoni Haryadi Setiabudi,et al.  Data mining market basket analysis' using hybrid-dimension association rules, case study in Minimarket X , 2011, 2011 International Conference on Uncertainty Reasoning and Knowledge Engineering.

[71]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[72]  Mohammad Alshayeb,et al.  Software defect prediction using ensemble learning on selected features , 2015, Inf. Softw. Technol..

[73]  Qinbao Song,et al.  Using Coding-Based Ensemble Learning to Improve Software Defect Prediction , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[74]  Hafsa Zafar,et al.  Finding focused itemsets from software defect data , 2012, 2012 15th International Multitopic Conference (INMIC).

[75]  Osmar R. Zaïane,et al.  Associative Classifiers for Medical Images , 2002, Revised Papers from MDM/KDD and PAKDD/KDMCD.

[76]  Shah Mostafa Khaled,et al.  An attribute selection process for software defect prediction , 2014, 2014 International Conference on Informatics, Electronics & Vision (ICIEV).

[77]  Zsuzsanna Marian,et al.  Software defect prediction using relational association rule mining , 2014, Inf. Sci..

[78]  Riyanarto Sarno,et al.  Business process anomaly detection using ontology-based process modelling and Multi-Level Class Association Rule Learning , 2015, 2015 International Conference on Computer, Control, Informatics and its Applications (IC3INA).

[79]  Bart Baesens,et al.  Toward Comprehensible Software Fault Prediction Models Using Bayesian Network Classifiers , 2013, IEEE Transactions on Software Engineering.

[80]  Xin Yao,et al.  A Learning-to-Rank Approach to Software Defect Prediction , 2015, IEEE Transactions on Reliability.