COSINE: A Vertical Group Difference Approach to Contrast Set Mining

Contrast sets have been shown to be a useful mechanism for describing differences between groups. A contrast set is a conjunction of attribute-value pairs that differ significantly in their distribution across groups. These groups are defined by a selected property that distinguishes one from the other (e.g customers who default on their mortgage versus those that don't). In this paper, we propose a new search algorithm which uses a vertical approach for mining maximal contrast sets on categorical and quantitative data. We utilize a novel yet simple discretization technique, akin to simple binning, for continuous-valued attributes. Our experiments on real datasets demonstrate that our approach is more efficient than two previously proposed algorithms, and more effective in filtering interesting contrast sets.

[1]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[2]  Stephen D. Bay,et al.  Detecting change in categorical data: mining contrast sets , 1999, KDD '99.

[3]  Eamonn J. Keogh,et al.  Group SAX: Extending the Notion of Contrast Sets to Time Series and Multimedia Data , 2006, PKDD.

[4]  Theodoros Kostoulas,et al.  Detection of Negative Emotional States in Real-World Scenario , 2007 .

[5]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[6]  Mohammed J. Zaki,et al.  Fast vertical mining using diffsets , 2003, KDD '03.

[7]  Johannes Fürnkranz,et al.  Knowledge Discovery in Databases: PKDD 2006, 10th European Conference on Principles and Practice of Knowledge Discovery in Databases, Berlin, Germany, September 18-22, 2006, Proceedings , 2006, PKDD.

[8]  Stephen D. Bay,et al.  Detecting Group Differences: Mining Contrast Sets , 2001, Data Mining and Knowledge Discovery.

[9]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[10]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[11]  Robert J. Hilderman,et al.  Exploratory Quantitative Contrast Set Mining: A Discretization Approach , 2007, 19th IEEE International Conference on Tools with Artificial Intelligence(ICTAI 2007).

[12]  Mohammed J. Zaki,et al.  GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets , 2005, Data Mining and Knowledge Discovery.

[13]  Tzu-Tsung Wong,et al.  Mining negative contrast sets from data with discrete attributes , 2005, Expert Syst. Appl..