论文信息 - Discovering statistically non-redundant subgroups

Discovering statistically non-redundant subgroups

The objective of subgroup discovery is to find groups of individuals who are statistically different from others in a large data set. Most existing measures of the quality of subgroups are intuitive and do not precisely capture statistical differences of a group with the other, and their discovered results contain many redundant subgroups. Odds ratio is a statistically sound measure to quantify the statistical difference of two groups for a certain outcome and it is a very suitable measure for quantifying the quality of subgroups. In this paper, we propose a statistically sound framework for statistically non-redundant subgroup discovery: measuring the quality of subgroups by the odds ratio and defining statistically non-redundant subgroups by the error bounds of odds ratios. We show that our proposed method is faster than most existing methods and discovers complete statistically non-redundant subgroups.

[1] Frank Puppe,et al. SD-Map - A Fast Algorithm for Exhaustive Subgroup Discovery , 2006, PKDD.

[2] Geoffrey I. Webb,et al. On detecting differences between groups , 2003, KDD '03.

[3] Nancy L. Leech,et al. SPSS for Introductory Statistics : Use and Interpretation, Second Edition , 2004 .

[4] Geoffrey I. Webb,et al. Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set, Emerging Pattern and Subgroup Mining , 2009, J. Mach. Learn. Res..

[5] Jinyan Li,et al. Relative risk and odds ratio: a data mining perspective , 2005, PODS '05.

[6] Jinyan Li,et al. Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[7] Dimitrios Gunopulos,et al. Constraint-Based Rule Mining in Large, Dense Databases , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[8] Jiuyong Li. On optimal rule discovery , 2006 .

[9] Luc De Raedt,et al. Correlated itemset mining in ROC space: a constraint programming approach , 2009, KDD.

[10] J. Fleiss,et al. Statistical methods for rates and proportions , 1973 .

[11] Stephen D. Bay,et al. Detecting Group Differences: Mining Contrast Sets , 2001, Data Mining and Knowledge Discovery.

[12] Stefan Wrobel,et al. An Algorithm for Multi-relational Discovery of Subgroups , 1997, PKDD.

[13] Daniel Paurat,et al. Fast and Memory-Efficient Discovery of the Top-k Relevant Subgroups in a Reduced Candidate Space , 2011, ECML/PKDD.

[14] Jian Pei,et al. Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[15] Frank Puppe,et al. Introspective Subgroup Analysis for Interactive Knowledge Refinement , 2006, FLAIRS Conference.

[16] Stefan Wrobel,et al. Tight Optimistic Estimates for Fast Subgroup Discovery , 2008, ECML/PKDD.

[17] Stefan Rüping,et al. On subgroup discovery in numerical domains , 2009, Data Mining and Knowledge Discovery.

[18] Christian Borgelt,et al. EFFICIENT IMPLEMENTATIONS OF APRIORI AND ECLAT , 2003 .

[19] B. Everitt,et al. Statistical methods for rates and proportions , 1973 .

[20] Geoffrey I. Webb. Discovering Significant Patterns , 2007, Machine Learning.

[21] Wouter Duivesteijn,et al. Discovering Local Subgroups, with an Application to Fraud Detection , 2013, PAKDD.

[22] Siegfried Nijssen,et al. Efficient Algorithms for Finding Richer Subgroup Descriptions in Numeric and Nominal Data , 2012, 2012 IEEE 12th International Conference on Data Mining.

[23] Arno J. Knobbe,et al. Non-redundant Subgroup Discovery in Large and Complex Data , 2011, ECML/PKDD.

[24] Nada Lavrac,et al. Expert-Guided Subgroup Discovery: Methodology and Application , 2011, J. Artif. Intell. Res..

[25] A. J. Feelders,et al. Subgroup Discovery Meets Bayesian Networks -- An Exceptional Model Mining Approach , 2010, 2010 IEEE International Conference on Data Mining.

[26] Henrik Grosskreutz,et al. Subgroup Discovery for Election Analysis: A Case Study in Descriptive Data Mining , 2010, Discovery Science.

[27] Arno J. Knobbe,et al. Diverse subgroup set discovery , 2012, Data Mining and Knowledge Discovery.

[28] Saso Dzeroski,et al. Inductive process modeling , 2008, Machine Learning.

[29] Branko Kavsek,et al. APRIORI-SD: ADAPTING ASSOCIATION RULE LEARNING TO SUBGROUP DISCOVERY , 2006, IDA.

[30] Henrik Grosskreutz,et al. Non-redundant Subgroup Discovery Using a Closure System , 2009, ECML/PKDD.

[31] Willi Klösgen,et al. Explora: A Multipattern and Multistrategy Discovery Assistant , 1996, Advances in Knowledge Discovery and Data Mining.

[32] Nada Lavrac,et al. Contrast Set Mining Through Subgroup Discovery Applied to Brain Ischaemina Data , 2007, PAKDD.

[33] Peter Clark,et al. The CN2 induction algorithm , 2004, Machine Learning.

[34] Rajeev Motwani,et al. Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[35] Peter A. Flach,et al. Subgroup Discovery with CN2-SD , 2004, J. Mach. Learn. Res..

[36] Nicolas Pasquier,et al. Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[37] Mohammed J. Zaki. Mining Non-Redundant Association Rules , 2004, Data Min. Knowl. Discov..

[38] Wouter Duivesteijn,et al. Exploiting False Discoveries -- Statistical Validation of Patterns and Quality Measures in Subgroup Discovery , 2011, 2011 IEEE 11th International Conference on Data Mining.

[39] Shinichi Morishita,et al. Transversing itemset lattices with statistical metric pruning , 2000, PODS '00.

[40] Heikki Mannila,et al. Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[41] María José del Jesús,et al. An overview on subgroup discovery: foundations and applications , 2011, Knowledge and Information Systems.