Software defect prediction using relational association rule mining

This paper focuses on the problem of defect prediction, a problem of major importance during software maintenance and evolution. It is essential for software developers to identify defective software modules in order to continuously improve the quality of a software system. As the conditions for a software module to have defects are hard to identify, machine learning based classification models are still developed to approach the problem of defect prediction. We propose a novel classification model based on relational association rules mining. Relational association rules are an extension of ordinal association rules, which are a particular type of association rules that describe numerical orderings between attributes that commonly occur over a dataset. Our classifier is based on the discovery of relational association rules for predicting whether a software module is or it is not defective. An experimental evaluation of the proposed model on the open source NASA datasets, as well as a comparison to similar existing approaches is provided. The obtained results show that our classifier overperforms, for most of the considered evaluation measures, the existing machine learning based techniques for defect prediction. This confirms the potential of our proposal.

[1]  Qinbao Song,et al.  A General Software Defect-Proneness Prediction Framework , 2011, IEEE Transactions on Software Engineering.

[2]  Zhi-Hua Zhou,et al.  Sample-based software defect prediction with active and semi-supervised learning , 2012, Automated Software Engineering.

[3]  Stephen V. Stehman,et al.  Selecting and interpreting measures of thematic classification accuracy , 1997 .

[4]  Nicolino J. Pizzi,et al.  A fuzzy classifier approach to estimating software quality , 2013, Inf. Sci..

[5]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[6]  S. M. Fakhrahmad,et al.  Applying Mining Schemes to Software Fault Prediction : A Proposed Approach Aimed at Test Cost Reduction , .

[7]  Andrian Marcus,et al.  Ordinal association rules for error identification in data sets , 2001, CIKM '01.

[8]  Puneet Jai Kaur Data Mining Techniques for Software Defect Prediction , 2013 .

[9]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[10]  Zsuzsanna Marian,et al.  Using Software Metrics for Automatic Software Design Improvement , 2012 .

[11]  C. Spearman The proof and measurement of association between two things. , 2015, International journal of epidemiology.

[12]  Sanjay Kumar Dubey,et al.  Software Defect Prediction Models for Quality Improvement : A Literature Study , 2012 .

[13]  Behrouz Minaei-Bidgoli,et al.  Mining numerical association rules via multi-objective genetic algorithms , 2013, Inf. Sci..

[14]  Robert L. Grossman,et al.  Data Mining for Scientific and Engineering Applications , 2001, Massive Computing.

[15]  Andrian Marcus,et al.  An Algorithm for the Discovery of Arbitrary Length Ordinal Association Rules , 2006, DMIN.

[16]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[17]  Zheng Pei,et al.  MINING FUZZY ASSOCIATION RULES FROM DATABASE , 2009 .

[18]  Tracy Hall,et al.  A Systematic Literature Review on Fault Prediction Performance in Software Engineering , 2012, IEEE Transactions on Software Engineering.

[19]  Lionel C. Briand,et al.  Assessing the Applicability of Fault-Proneness Models Across Object-Oriented Software Projects , 2002, IEEE Trans. Software Eng..

[20]  Francisco Herrera,et al.  An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics , 2013, Inf. Sci..

[21]  Gary D. Boetticher,et al.  Improving Credibility of Machine Learner Models in Software Engineering , 2007 .

[22]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[23]  Robert B. Grady,et al.  Practical Software Metrics for Project Management and Process Improvement , 1992 .

[24]  Luca Cagliero,et al.  Generalized association rule mining with constraints , 2012, Inf. Sci..

[25]  Michele Lanza,et al.  Evaluating defect prediction approaches: a benchmark and an extensive comparison , 2011, Empirical Software Engineering.

[26]  Peter A. Flach,et al.  Subgroup Discovery with CN2-SD , 2004, J. Mach. Learn. Res..

[27]  Du Zhang,et al.  Advances in Machine Learning Applications in Software Engineering , 2007 .

[28]  Bruce Christianson,et al.  The misuse of the NASA metrics data program data sets for automated software defect prediction , 2011, EASE.

[29]  Bruce Christianson,et al.  Further thoughts on precision , 2011, EASE.

[30]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[31]  Sallie M. Henry,et al.  Software Structure Metrics Based on Information Flow , 1981, IEEE Transactions on Software Engineering.

[32]  Bojan Cukic,et al.  Robust prediction of fault-proneness by random forests , 2004, 15th International Symposium on Software Reliability Engineering.

[33]  Jan Vanthienen,et al.  Software Defect Prediction Based on Association Rule Classification , 2010 .

[34]  Li Zhang,et al.  Software Defect Prediction Using Non-Negative Matrix Factorization , 2011, J. Softw..

[35]  Chris F. Kemerer,et al.  Towards a metrics suite for object oriented design , 2017, OOPSLA '91.

[36]  Jesús S. Aguilar-Ruiz,et al.  Searching for rules to detect defective modules: A subgroup discovery approach , 2012, Inf. Sci..

[37]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[38]  Akito Monden,et al.  A hybrid faulty module prediction using association rule mining and logistic regression analysis , 2008, ESEM '08.

[39]  Tim Menzies,et al.  Special issue on repeatable results in software engineering prediction , 2012, Empirical Software Engineering.

[40]  Qinbao Song,et al.  Data Quality: Some Comments on the NASA Software Defect Datasets , 2013, IEEE Transactions on Software Engineering.

[41]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[42]  Claus Lewerentz,et al.  Metrics Based Refactoring , 2001, CSMR.

[43]  Thomas J. Ostrand,et al.  \{PROMISE\} Repository of empirical software engineering data , 2007 .

[44]  Bing Liu,et al.  Classification Using Association Rules: Weaknesses and Enhancements , 2001 .

[45]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[46]  Zhi-Hua Zhou,et al.  Software Defect Detection with Rocus , 2011, Journal of Computer Science and Technology.

[47]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[48]  Yue Jiang,et al.  Techniques for evaluating fault prediction models , 2008, Empirical Software Engineering.

[49]  Venkata U. B. Challagulla,et al.  Empirical assessment of machine learning based software defect prediction techniques , 2005, 10th IEEE International Workshop on Object-Oriented Real-Time Dependable Systems.

[50]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[51]  Frank J. Mitropoulos,et al.  Aspect mining using Self-Organizing Maps with method level dynamic software metrics as input vectors , 2010, 2010 2nd International Conference on Software Technology and Engineering.