Interestingness of Association Rules Using Symmetrical Tau and Logistic Regression

While association rule mining is one of the most popular data mining techniques, it usually results in many rules, some of which are not considered as interesting or significant for the application at hand. In this paper, we conduct a systematic approach to ascertain the discovered rules and provide a rigorous statistical approach supporting this framework. The strategy proposed combines data mining and statistical measurement techniques, including redundancy analysis, sampling and multivariate statistical analysis, to discard the non significant rules. A real world dataset is used to demonstrate how the proposed unified framework can discard many of the redundant or non significant rules and still preserve high accuracy of the rule set as a whole.

[1]  Philip S. Yu,et al.  A new framework for itemset generation , 1998, PODS '98.

[2]  Geoffrey I. Webb,et al.  Preliminary investigations into statistically valid exploratory rule discovery , 2003 .

[3]  Kenneth McGarry,et al.  A survey of interestingness measures for knowledge discovery , 2005, The Knowledge Engineering Review.

[4]  Dimitrios Gunopulos,et al.  Constraint-Based Rule Mining in Large, Dense Databases , 2004, Data Mining and Knowledge Discovery.

[5]  Yehuda Lindell,et al.  A Statistical Theory for Quantitative Association Rules , 1999, KDD '99.

[6]  Patrick Meyer,et al.  On selecting interestingness measures for association rules: User oriented description and multiple criteria decision aid , 2008, Eur. J. Oper. Res..

[7]  Nimrod Megiddo,et al.  Discovering Predictive Association Rules , 1998, KDD.

[8]  Wynne Hsu,et al.  Pruning and summarizing the discovered associations , 1999, KDD '99.

[9]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[10]  Xinghuo Yu,et al.  AI 2004: Advances in Artificial Intelligence, 17th Australian Joint Conference on Artificial Intelligence, Cairns, Australia, December 4-6, 2004, Proceedings , 2004, Australian Conference on Artificial Intelligence.

[11]  Philip S. Yu,et al.  Direct Discriminative Pattern Mining for Effective Classification , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[12]  Tharam S. Dillon,et al.  A Statistical-Heuristic Feature Selection Criterion for Decision Tree Induction , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Raj P. Gopalan,et al.  Effective Sampling for Mining Association Rules , 2004, Australian Conference on Artificial Intelligence.

[14]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[15]  Edward Omiecinski,et al.  Alternative Interest Measures for Mining Associations in Databases , 2003, IEEE Trans. Knowl. Data Eng..

[16]  Balaji Padmanabhan,et al.  On the discovery of significant statistical quantitative rules , 2004, KDD.

[17]  Geoffrey I. Webb Discovering significant patterns , 2008, Machine Learning.

[18]  Howard J. Hamilton,et al.  Interestingness measures for data mining: A survey , 2006, CSUR.

[19]  Geert Wets,et al.  Defining interestingness for association rules , 2003 .

[20]  Vipin Kumar,et al.  A Perspective on Cluster Analysis , 2008 .

[21]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[22]  Stephen D. Bay,et al.  Detecting Group Differences: Mining Contrast Sets , 2001, Data Mining and Knowledge Discovery.

[23]  AgrawalRakesh,et al.  Mining association rules between sets of items in large databases , 1993 .

[24]  Gregory Piatetsky-Shapiro,et al.  Discovery, Analysis, and Presentation of Strong Rules , 1991, Knowledge Discovery in Databases.

[25]  Rajeev Motwani,et al.  Beyond Market Baskets: Generalizing Association Rules to Dependence Rules , 1998, Data Mining and Knowledge Discovery.

[26]  Jiawei Han,et al.  Discriminative Frequent Pattern Analysis for Effective Classification , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[27]  D UllmanJeffrey,et al.  Dynamic itemset counting and implication rules for market basket data , 1997 .

[28]  Peter A. Flach,et al.  Rule Evaluation Measures: A Unifying View , 1999, ILP.