论文信息 - Discovering Predictive Association Rules

Discovering Predictive Association Rules

Association rule algorithms can produce a very large number of output patterns. This has raised questions of whether the set of discovered rules "overfit" the data because all the patterns that satisfy some constraints are generated (the Bonferroni effect). In other words, the question is whether some of the rules are "false discoveries" that are not statistically significant. We present a novel approach for estimating the number of "false discoveries" at any cutoff level. Empirical evaluation shows that on typical datasets the fraction of rules that may be false discoveries is very small. A bonus of this work is that the statistical significance measures we compute are a good basis for ordering the rules for presentation to users, since they correspond to the statistical "surprise" of the rule. We also show how to compute confidence intervals for the support and confidence of an association rule, enabling the rule to be used predictively on future data.

Nimrod Megiddo | Ramakrishnan Srikant | N. Megiddo | R. Srikant

[1] Rajeev Motwani,et al. Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[2] Rajeev Motwani,et al. Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[3] A. Tamhane,et al. Multiple Comparison Procedures , 1989 .

[4] Gregory Piatetsky-Shapiro,et al. Discovery, Analysis, and Presentation of Strong Rules , 1991, Knowledge Discovery in Databases.

[5] Kamal Ali,et al. Partial Classification Using Association Rules , 1997, KDD.

[6] Byron L. Newton,et al. Statistics for Business , 1973 .

[7] Michael J. Rothman,et al. Applying Data Mining Techniques to a Health Insurance Information System , 1996, VLDB.

[8] Tomasz Imielinski,et al. Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[9] Heikki Mannila,et al. Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.