论文信息 - Detecting change in categorical data: mining contrast sets

Detecting change in categorical data: mining contrast sets

A fundamental task in data analysis is understanding the di erences between several contrasting groups. These groups can represent di erent classes of objects, such as male or female students, or the same group over time, e.g. freshman students in 1993 versus 1998. We present the problem of mining contrast-sets: conjunctions of attributes and values that di er meaningfully in their distribution across groups. We provide an algorithm for mining contrast-sets as well as several pruning rules to reduce the computational complexity. Once the deviations are found, we post-process the results to present a subset that are surprising to the user given what we have already shown. We explicitly control the probability of Type I error (false positives) and guarantee a maximum error rate for the entire analysis by using Bonferroni corrections.

Stephen D. Bay | Michael J. Pazzani | M. Pazzani

[1] Catherine Blake,et al. UCI Repository of machine learning databases , 1998 .

[2] James Joseph Biundo,et al. Analysis of Contingency Tables , 1969 .

[3] Oren Etzioni,et al. Representation design and brute-force induction in a Boeing manufacturing domain , 1994, Appl. Artif. Intell..

[4] Willi Klösgen,et al. A Support System for Interpreting Statistical Data , 1991, Knowledge Discovery in Databases.

[5] Ramakrishnan Srikant,et al. Fast algorithms for mining association rules , 1998, VLDB 1998.

[6] J. Davies,et al. Hierarchical categorization and the effects of contrast inconsistency in an unsupervised learning task , 1996 .

[7] Tomasz Imielinski,et al. Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[8] Roberto J. Bayardo,et al. Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[9] Sunita Sarawagi,et al. Mining Surprising Patterns Using Temporal Description Length , 1998, VLDB.

[10] J. Shaffer. Multiple Hypothesis Testing , 1995 .

[11] Ramakrishnan Srikant,et al. Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.