A principled and flexible framework for finding alternative clusterings

The aim of data mining is to find novel and actionable insights in data. However, most algorithms typically just find a single (possibly non-novel/actionable) interpretation of the data even though alternatives could exist. The problem of finding an alternative to a given original clustering has received little attention in the literature. Current techniques (including our previous work) are unfocused/unrefined in that they broadly attempt to find an alternative clustering but do not specify which properties of the original clustering should or should not be retained. In this work, we explore a principled and flexible framework in order to find alternative clusterings of the data. The approach is principled since it poses a constrained optimization problem, so its exact behavior is understood. It is flexible since the user can formally specify positive and negative feedback based on the existing clustering, which ranges from which clusters to keep (or not) to making a trade-off between alternativeness and clustering quality.

[1]  James Bailey,et al.  COALA: A Novel Approach for the Extraction of an Alternate Clustering of High Quality and High Dissimilarity , 2006, Sixth International Conference on Data Mining (ICDM'06).

[2]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Inderjit S. Dhillon,et al.  Simultaneous Unsupervised Learning of Disparate Clusterings , 2008, Stat. Anal. Data Min..

[4]  S. S. Ravi,et al.  Intractability and clustering with constraints , 2007, ICML '07.

[5]  Thomas Hofmann,et al.  Non-redundant data clustering , 2006, Knowledge and Information Systems.

[6]  S. S. Ravi,et al.  The complexity of non-hierarchical clustering with instance and cluster level constraints , 2007, Data Mining and Knowledge Discovery.

[7]  Ian Davidson,et al.  Finding Alternative Clusterings Using Constraints , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[8]  Ying Cui,et al.  Non-redundant Multi-view Clustering via Orthogonalization , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[9]  S. S. Ravi,et al.  Efficient incremental constrained clustering , 2007, KDD '07.

[10]  James Saunderson,et al.  Spectral clustering with inconsistent advice , 2008, ICML '08.