Exploratory Quantitative Contrast Set Mining: A Discretization Approach

Contrast sets have been shown to be a useful tool for describing differences between groups. A contrast set is a set of association rules for which the antecedents describe distinct groups, a common consequent is shared by all the rules, and support for the rules is significantly different between groups. While techniques for generating contrast sets containing categorical attributes in the consequent are "straightforward", techniques for generating contrast sets containing continuous-valued attributes are not. In this paper, we describe a technique for generating contrast sets describing the differences between two groups, where the consequent in the rules contains up to two continuous-valued attributes. We propose a modified equal- width binning interval approach to discretizing continuous-valued attributes, where the approximate width of the desired intervals is provided as a parameter to the model. We also propose an objective measure for identifying and ranking the potentially interesting contrast sets. Experimental results demonstrate the effectiveness of our approach and the utility of the interest measure.

[1]  Yehuda Lindell,et al.  A Statistical Theory for Quantitative Association Rules , 1999, KDD.

[2]  Szymon Jaroszewicz,et al.  Mining rank-correlated sets of numerical attributes , 2006, KDD '06.

[3]  Balaji Padmanabhan,et al.  On the discovery of significant statistical quantitative rules , 2004, KDD.

[4]  Robert J. Hilderman The Lorenz Dominance Order as a Measure of Interestingness in KDD , 2002, PAKDD.

[5]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[6]  Robert J. Hilderman,et al.  Statistical Methodologies for Mining Potentially Interesting Contrast Sets , 2007, Quality Measures in Data Mining.

[7]  Robert Meersman,et al.  On the Complexity of Mining Quantitative Association Rules , 1998, Data Mining and Knowledge Discovery.

[8]  Howard J. Hamilton,et al.  Knowledge discovery and measures of interest , 2001 .

[9]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[10]  Stephen D. Bay,et al.  Detecting change in categorical data: mining contrast sets , 1999, KDD '99.

[11]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[12]  Gary William Flake,et al.  Efficient SVM Regression Training with SMO , 2002, Machine Learning.

[13]  Geoffrey I. Webb Discovering associations with numeric variables , 2001, KDD '01.

[14]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[15]  Ramakrishnan Srikant,et al.  Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.

[16]  Geoffrey I. Webb,et al.  On detecting differences between groups , 2003, KDD '03.

[17]  Chih-Jen Lin,et al.  A Study on SMO-Type Decomposition Methods for Support Vector Machines , 2006, IEEE Transactions on Neural Networks.

[18]  Federico Girosi,et al.  Training support vector machines: an application to face detection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19]  Stephen D. Bay,et al.  Detecting Group Differences: Mining Contrast Sets , 2001, Data Mining and Knowledge Discovery.