Statistical strategies for Pruning all the uninteresting association rules

We propose a general framework to formalize the problem of capturing the intensity of implication for association rules through statistical metrics. In this framework we present properties that influence the interestingness of a rule, analyze the conditions that lead a measure to perform a perfect prune at a time, and define a final proper order to sort the surviving rules. We will discuss why none of the currently employed measures can capture objective interestingness, and just the combination of some of them in a multi-step fashion, can be reliable. In contrast, we propose a new simple modification of the Pearson coefficient that will meet all the necessary requirements. We statistically infer the convenient cut-off threshold for this new metric by empirically describing its distribution function through simulation. Experiments show a promising behaviour of our proposal.

[1]  Roberto J. Bayardo,et al.  Mining the most interesting rules , 1999, KDD '99.

[2]  Dan A. Simovici,et al.  Generating an informative cover for association rules , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[3]  Gregory Piatetsky-Shapiro,et al.  Discovery, Analysis, and Presentation of Strong Rules , 1991, Knowledge Discovery in Databases.

[4]  Wynne Hsu,et al.  Pruning and summarizing the discovered associations , 1999, KDD '99.

[5]  Howard J. Hamilton,et al.  Evaluation of Interestingness Measures for Ranking Discovered Knowledge , 2001, PAKDD.

[6]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[7]  Abraham Silberschatz,et al.  On Subjective Measures of Interestingness in Knowledge Discovery , 1995, KDD.

[8]  Gerd Stumme,et al.  Mining Minimal Non-redundant Association Rules Using Frequent Closed Itemsets , 2000, Computational Logic.

[9]  Pang-Ning Tan,et al.  Interestingness Measures for Association Patterns : A Perspective , 2000, KDD 2000.

[10]  Daniel Sánchez,et al.  Measuring the accuracy and interest of association rules: A new framework , 2002, Intell. Data Anal..

[11]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[12]  Rajeev Motwani,et al.  Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[13]  José L. Balcázar,et al.  Bounding Negative Information in Frequent Sets Algorithms , 2001, Discovery Science.