Identifying Association between Longer Itemsets and Software Defects

Software defects are an indicator of software quality. Software with lesser number of defective modules are desired. Prediction of software defects using software measurements facilitates early identification of defect-prone modules. Association relationship between software measures and defects improves prediction of defective modules. To find association relationship between software measures and defects, each numeric measure is divided into bins. Each bin is called 1-itemset (or an itemset of length 1). When certain itemsets and defective modules appear together in a dataset, they are considered associated with each other. Frequency of their co-occurrence depicts the strength of the association relationship. Existing studies find the relationship between 1-itemsets and defective modules. Itemsets that have high association with defects are called focused itemsets. Focused itemsets can be used to build prediction models with higher Recall values. This paper explores the relationship between defective modules and itemsets with length greater than 1. Focused itemsets with length greater than 1 involve multiple bins at same time. Identification of the focused itemsets has improved the performance of decision tree based defect prediction model.

[1]  Stephen R. Garner,et al.  WEKA: The Waikato Environment for Knowledge Analysis , 1996 .

[2]  Mian M. Awais,et al.  Towards a generic model for software quality prediction , 2008, WoSQ '08.

[3]  Thomas J. Ostrand,et al.  \{PROMISE\} Repository of empirical software engineering data , 2007 .

[4]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007 .

[5]  Norman E. Fenton,et al.  A Critique of Software Defect Prediction Models , 1999, IEEE Trans. Software Eng..

[6]  Mian M. Awais,et al.  Using Association Rules to Identify Similarities between Software Datasets , 2012, 2012 Eighth International Conference on the Quality of Information and Communications Technology.

[7]  Venkata U. B. Challagulla,et al.  Empirical assessment of machine learning based software defect prediction techniques , 2005, 10th IEEE International Workshop on Object-Oriented Real-Time Dependable Systems.

[8]  Akito Monden,et al.  A hybrid faulty module prediction using association rule mining and logistic regression analysis , 2008, ESEM '08.

[9]  Hafsa Zafar,et al.  Finding focused itemsets from software defect data , 2012, 2012 15th International Multitopic Conference (INMIC).

[10]  Qinbao Song,et al.  Software defect association mining and defect correction effort prediction , 2006 .

[11]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[12]  Jan Vanthienen,et al.  Software Defect Prediction Based on Association Rule Classification , 2010 .