Software Defect Prediction Based on Association Rule Classification

In software defect prediction, predictive models are estimated based on various code attributes to assess the likelihood of software modules containing errors. Many classification methods have been suggested to accomplish this task. However, association based classification methods have not been investigated so far in this context. This paper assesses the use of such a classification method, CBA2, and compares it to other rule based classification methods. Furthermore, we investigate whether rule sets generated on data from one software project can be used to predict defective software modules in other, similar software projects. It is found that applying the CBA2 algorithm results in both accurate and comprehensible rule sets.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  Casimir A. Kulikowski,et al.  Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning and Expert Systems , 1990 .

[3]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[4]  Glenn Fung,et al.  Rule extraction from linear support vector machines , 2005, KDD '05.

[5]  Tom Fawcett,et al.  Using rule sets to maximize ROC performance , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[6]  Yiming Ma,et al.  Improving an Association Rule Based Classifier , 2000, PKDD.

[7]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[8]  Tim Menzies,et al.  The \{PROMISE\} Repository of Software Engineering Databases. , 2005 .

[9]  Iker Gondra,et al.  Applying machine learning to software fault-proneness prediction , 2008, J. Syst. Softw..

[10]  C. Smith Diagnostic tests (1) – sensitivity and specificity , 2012, Phlebology.

[11]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[12]  Magne Jørgensen,et al.  An analysis of data sets used to train and validate cost prediction systems , 2005, PROMISE '05.

[13]  Banu Diri,et al.  A systematic review of software fault prediction studies , 2009, Expert Syst. Appl..

[14]  Banu Diri,et al.  Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem , 2009, Inf. Sci..

[15]  Ian Witten,et al.  Data Mining , 2000 .

[16]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[17]  Lenon Beeson,et al.  Introductory Statistics: A Decision Map , 1978 .

[18]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[19]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.