论文信息 - Controlling False Positives in Association Rule Mining

Controlling False Positives in Association Rule Mining

Association rule mining is an important problem in the data mining area. It enumerates and tests a large number of rules on a dataset and outputs rules that satisfy user-specified constraints. Due to the large number of rules being tested, rules that do not represent real systematic effect in the data can satisfy the given constraints purely by random chance. Hence association rule mining often suffers from a high risk of false positive errors. There is a lack of comprehensive study on controlling false positives in association rule mining. In this paper, we adopt three multiple testing correction approaches---the direct adjustment approach, the permutation-based approach and the holdout approach---to control false positives in association rule mining, and conduct extensive experiments to study their performance. Our results show that (1) Numerous spurious rules are generated if no correction is made. (2) The three approaches can control false positives effectively. Among the three approaches, the permutation-based approach has the highest power of detecting real association rules, but it is very computationally expensive. We employ several techniques to reduce its cost effectively.

Guimei Liu | Limsoon Wong | Haojun Zhang

[1] Wynne Hsu,et al. Integrating Classification and Association Rule Mining , 1998, KDD.

[2] Eli Upfal,et al. An efficient rigorous approach for identifying statistically significant frequent itemsets , 2009, JACM.

[3] Jaideep Srivastava,et al. Selecting the right interestingness measure for association patterns , 2002, KDD.

[4] R. Fisher. On the Interpretation of χ 2 from Contingency Tables , and the Calculation of P Author , 2022 .

[5] Jiawei Han,et al. CPAR: Classification based on Predictive Association Rules , 2003, SDM.

[6] Stephen D. Bay,et al. Detecting Group Differences: Mining Contrast Sets , 2001, Data Mining and Knowledge Discovery.

[7] Nicolas Pasquier,et al. Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[8] Yogendra P. Chaubey. Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment , 1993 .

[9] Y. Benjamini,et al. Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[10] Nimrod Megiddo,et al. Discovering Predictive Association Rules , 1998, KDD.

[11] R. Fisher. On the Interpretation of χ2 from Contingency Tables, and the Calculation of P , 2018, Journal of the Royal Statistical Society Series A (Statistics in Society).