Associative Classification with Statistically Significant Positive and Negative Rules

Rule-based classifier has shown its popularity in building many decision support systems such as medical diagnosis and financial fraud detection. One major advantage is that the models are human understandable and can be edited. Associative classifiers, as an extension of rule-based classifiers, use association rules to associate attributes with class labels. A delicate issue of associative classifiers is the need for subtle thresholds: minimum support and minimum confidence. Without prior knowledge, it could be difficult to choose the proper thresholds, and the discovered rules within the support-confidence framework are not statistically significant, i.e., inclusion of noisy rules and exclusion of valuable rules. Besides, most associative classifiers proposed so far, are built with only positive association rules. Negative rules, however, are also able to provide valuable information to discriminate between classes. To solve the above mentioned problems, we propose a novel associative classifier which is built upon both positive and negative classification association rules that show statistically significant dependencies. Experimental results on real-world datasets show that our method achieves competitive or even better performance than well-known rule-based and associative classifiers in terms of both classification accuracy and computational efficiency.

[1]  Geoffrey I. Webb,et al.  Mining Negative Rules Using GRD , 2004, PAKDD.

[2]  Osmar R. Zaïane,et al.  Text document categorization by term association , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[3]  Osmar R. Zaïane,et al.  Mining Positive and Negative Association Rules: An Approach for Confined Rules , 2004, PKDD.

[4]  Philip S. Yu,et al.  Mining Associations with the Collective Strength Approach , 2001, IEEE Trans. Knowl. Data Eng..

[5]  Geoffrey I. Webb Discovering Significant Patterns , 2007, Machine Learning.

[6]  R. Mike Cameron-Jones,et al.  FOIL: A Midterm Report , 1993, ECML.

[7]  Osmar R. Zaïane,et al.  Negative Association Rules , 2014, Frequent Pattern Mining.

[8]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[9]  Sanjay Chawla,et al.  Using Significant, Positively Associated and Relatively Class Correlated Rules for Associative Classification of Imbalanced Datasets , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[10]  Jianyong Wang,et al.  HARMONY: Efficiently Mining the Best Rules for Classification , 2005, SDM.

[11]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[12]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[13]  Yun Sing Koh,et al.  Efficiently Finding Negative Association Rules Without Support Threshold , 2007, Australian Conference on Artificial Intelligence.

[14]  Sanjay Chawla,et al.  CCCS: a top-down associative classifier for imbalanced class distribution , 2006, KDD '06.

[15]  Jian Pei,et al.  CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[16]  Shamkant B. Navathe,et al.  Mining for strong negative associations in a large database of customer transactions , 1998, Proceedings 14th International Conference on Data Engineering.

[17]  金田 重郎,et al.  C4.5: Programs for Machine Learning (書評) , 1995 .

[18]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[19]  Jiawei Han,et al.  CPAR: Classification based on Predictive Association Rules , 2003, SDM.

[20]  Xindong Wu,et al.  Efficient mining of both positive and negative association rules , 2004, TOIS.

[21]  Osmar R. Zaïane,et al.  An associative classifier based on positive and negative rules , 2004, DMKD '04.

[22]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[23]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[24]  Wilhelmiina Hämäläinen,et al.  Efficient Discovery of the Top-K Optimal Dependency Rules with Fisher's Exact Test of Significance , 2010, 2010 IEEE International Conference on Data Mining.

[25]  Ming-Syan Chen,et al.  A statistical framework for mining substitution rules , 2005, Knowledge and Information Systems.

[26]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[27]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[28]  Guoqing Chen,et al.  Mining Positive and Negative Association Rules from Large Databases , 2006, 2006 IEEE Conference on Cybernetics and Intelligent Systems.

[29]  Osmar R. Zaïane,et al.  Learning to Use a Learned Model: A Two-Stage Approach to Classification , 2006, Sixth International Conference on Data Mining (ICDM'06).

[30]  Roberto J. Bayardo Brute-Force Mining of High-Confidence Classification Rules , 1997, KDD.

[31]  Wilhelmiina Hämäläinen,et al.  Kingfisher: an efficient algorithm for searching for both positive and negative dependency rules with statistical significance measures , 2011, Knowledge and Information Systems.

[32]  Philip S. Yu,et al.  A new framework for itemset generation , 1998, PODS '98.

[33]  Hao Wang,et al.  Mining a Complete Set of Both Positive and Negative Association Rules from Large Databases , 2008, PAKDD.

[34]  Ming-Syan Chen,et al.  On the mining of substitution rules for statistically dependent items , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[35]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[36]  Geoffrey I. Webb Discovering significant rules , 2006, KDD '06.

[37]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[38]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[39]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[40]  Rajeev Motwani,et al.  Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.