Detecting change in categorical data: mining contrast sets

A fundamental task in data analysis is understanding the di erences between several contrasting groups. These groups can represent di erent classes of objects, such as male or female students, or the same group over time, e.g. freshman students in 1993 versus 1998. We present the problem of mining contrast-sets: conjunctions of attributes and values that di er meaningfully in their distribution across groups. We provide an algorithm for mining contrast-sets as well as several pruning rules to reduce the computational complexity. Once the deviations are found, we post-process the results to present a subset that are surprising to the user given what we have already shown. We explicitly control the probability of Type I error (false positives) and guarantee a maximum error rate for the entire analysis by using Bonferroni corrections.