论文信息 - Efficient Discovery of Statistically Significant Association Rules

Efficient Discovery of Statistically Significant Association Rules

Searching statistically significant association rules is an important but neglected problem. Traditional association rules do not capture the idea of statistical dependence and the resulting rules can be spurious, while the most significant rules may be missing. This leads to erroneous models and predictions which often become expensive.The problem is computationally very difficult, because the significance is not a monotonic property. However, in this paper we prove several other properties, which can be used for pruning the search space. The properties are implemented in the StatApriori algorithm, which searches statistically significant, non-redundant association rules. Based on both theoretical and empirical observations, the resulting rules are very accurate compared to traditional association rules. In addition, StatApriori can work with extremely low frequencies, thus finding new interesting rules.

Matti Nykänen | Wilhelmiina Hämäläinen | W. Hämäläinen | M. Nykänen

[1] Elena Baralis,et al. A lazy approach to pruning classification rules , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[2] Geoffrey I. Webb. Discovering significant rules , 2006, KDD '06.

[3] Daniel Sánchez,et al. A New Framework to Assess Association Rules , 2001, IDA.

[4] Pang-Ning Tan,et al. Interestingness Measures for Association Patterns : A Perspective , 2000, KDD 2000.

[5] Philip S. Yu,et al. A new framework for itemset generation , 1998, PODS '98.

[6] Gerd Stumme,et al. Mining Minimal Non-redundant Association Rules Using Frequent Closed Itemsets , 2000, Computational Logic.

[7] 森下真一,et al. Parallel Branch-and-Bound Graph Search for Correlated Association Rules , 1999 .

[8] Rosa Meo. Theory of dependence values , 2000, TODS.

[9] Chris Jermaine,et al. Finding the most interesting correlations in a database: how hard can it be? , 2005, Inf. Syst..

[10] K. Carrière,et al. HOW GOOD IS A NORMAL APPROXIMATION FOR RATES AND PROPORTIONS OF LOW INCIDENCE EVENTS? , 2001 .

[11] Shinichi Morishita,et al. Transversing itemset lattices with statistical metric pruning , 2000, PODS '00.

[12] J. Kere,et al. Data mining applied to linkage disequilibrium mapping. , 2000, American journal of human genetics.

[13] Gregory Piatetsky-Shapiro,et al. Discovery, Analysis, and Presentation of Strong Rules , 1991, Knowledge Discovery in Databases.