Finding Alternative Clusterings Using Constraints

The aim of data mining is to find novel and actionable insights. However, most algorithms typically just find a single explanation of the data even though alternatives could exist. In this work, we explore a general purpose approach to find an alternative clustering of the data with the aid of must-link and cannot-link constraints. This problem has received little attention in the literature and since our approach can be incorporated into the many clustering algorithms that use a distance function, compares favorably with existing work.

[1]  James C. Bezdek,et al.  A geometric approach to cluster validity for normal mixtures , 1997, Soft Comput..

[2]  Ian Davidson,et al.  Minimum Message Length Clustering Using Gibbs Sampling , 2000, UAI.

[3]  Claire Cardie,et al.  Clustering with Instance-Level Constraints , 2000, AAAI/IAAI.

[4]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[5]  Dan Klein,et al.  Interpreting and Extending Classical Agglomerative Clustering Algorithms using a Model-Based approach , 2002, ICML.

[6]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[7]  Gal Chechik,et al.  Extracting Relevant Structures with Side Information , 2002, NIPS.

[8]  Arindam Banerjee,et al.  Semi-supervised Clustering by Seeding , 2002, ICML.

[9]  Thomas Hofmann,et al.  Non-redundant data clustering , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[10]  Arindam Banerjee,et al.  Active Semi-Supervision for Pairwise Constrained Clustering , 2004, SDM.

[11]  Rong Jin,et al.  Distance Metric Learning: A Comprehensive Survey , 2006 .

[12]  Rich Caruana,et al.  Meta Clustering , 2006, Sixth International Conference on Data Mining (ICDM'06).

[13]  James Bailey,et al.  COALA: A Novel Approach for the Extraction of an Alternate Clustering of High Quality and High Dissimilarity , 2006, Sixth International Conference on Data Mining (ICDM'06).

[14]  S. S. Ravi,et al.  The complexity of non-hierarchical clustering with instance and cluster level constraints , 2007, Data Mining and Knowledge Discovery.

[15]  Ying Cui,et al.  Non-redundant Multi-view Clustering via Orthogonalization , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[16]  Heikki Mannila,et al.  Principles of Data Mining , 2001, Undergraduate Topics in Computer Science.

[17]  Feiping Nie,et al.  Learning a Mahalanobis distance metric for data clustering and classification , 2008, Pattern Recognit..