论文信息 - GENCCS: A Correlated Group Difference Approach to Contrast Set Mining

GENCCS: A Correlated Group Difference Approach to Contrast Set Mining

Contrast set mining has developed as a data mining task which aims at discerning differences amongst groups. These groups can be patients, organizations, molecules, and even time-lines, and are defined by a selected property that distinguishes one from the other. A contrast set is a conjunction of attribute-value pairs that differ significantly in their distribution across groups. The search for contrast sets can be prohibitively expensive on relatively large datasets because every combination of attribute-values must be examined, causing a potential exponential growth of the search space. In this paper, we introduce the notion of a correlated group difference (CGD) and propose a contrast set mining technique that utilizes mutual information and all confidence to select the attribute-value pairs that are most highly correlated, in order to mine CGDs. Our experiments on real datasets demonstrate the efficiency of our approach and the interestingness of the CGDs discovered.

Robert J. Hilderman | Mondelle Simeon | M. Simeon

[1] Tzu-Tsung Wong,et al. Mining negative contrast sets from data with discrete attributes , 2005, Expert Syst. Appl..

[2] Roberto J. Bayardo,et al. Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[3] Hui Xiong,et al. TAPER: a two-step approach for all-strong-pairs correlation query in large databases , 2006, IEEE Transactions on Knowledge and Data Engineering.

[4] Robert J. Hilderman,et al. Exploratory Quantitative Contrast Set Mining: A Discretization Approach , 2007, 19th IEEE International Conference on Tools with Artificial Intelligence(ICTAI 2007).

[5] S. Holm. A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[6] Mohammed J. Zaki,et al. Fast vertical mining using diffsets , 2003, KDD '03.

[7] Wenfei Fan,et al. Conditional functional dependencies for capturing data inconsistencies , 2008, TODS.

[8] Wilfred Ng,et al. Correlated pattern mining in quantitative databases , 2008, TODS.

[9] Eamonn J. Keogh,et al. Group SAX: Extending the Notion of Contrast Sets to Time Series and Multimedia Data , 2006, PKDD.

[10] Nada Lavrac,et al. Contrast Set Mining for Distinguishing Between Similar Diseases , 2007, AIME.

[11] Thomas M. Cover,et al. Elements of Information Theory , 2005 .

[12] Mohammed J. Zaki,et al. GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets , 2005, Data Mining and Knowledge Discovery.

[13] Johannes Fürnkranz,et al. Knowledge Discovery in Databases: PKDD 2006, 10th European Conference on Principles and Practice of Knowledge Discovery in Databases, Berlin, Germany, September 18-22, 2006, Proceedings , 2006, PKDD.

[14] Jiawei Han,et al. Data Mining: Concepts and Techniques , 2000 .

[15] Nandit Soparkar,et al. Data organization and access for efficient data mining , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[16] Theodoros Kostoulas,et al. Detection of Negative Emotional States in Real-World Scenario , 2007 .

[17] Stephen D. Bay,et al. Detecting change in categorical data: mining contrast sets , 1999, KDD '99.

[18] Stephen D. Bay,et al. Detecting Group Differences: Mining Contrast Sets , 2001, Data Mining and Knowledge Discovery.