Simultaneous Unsupervised Learning of Disparate Clusterings

Most clustering algorithms produce a single clustering for a given dataset even when the data can be clustered naturally in multiple ways. In this paper, we address the difficult problem of uncovering disparate clusterings from the data in a totally unsupervised manner. We propose two new approaches for this problem. In the first approach, we aim to find good clusterings of the data that are also decorrelated with one another. To this end, we give a new and tractable characterization of decorrelation between clusterings, and present an objective function to capture it. We provide an iterative “decorrelated” k-means type algorithm to minimize this objective function. In the second approach, we model the data as a sum of mixtures and associate each mixture with a clustering. This approach leads us to the problem of learning a convolution of mixture distributions. Though the latter problem can be formulated as one of factorial learning 8, 13, 16, the existing formulations and methods do not perform well on many real high-dimensional datasets. We propose a new regularized factorial-learning framework that is more suitable for capturing the notion of disparate clusterings in modern, high-dimensional datasets. Furthermore, we provide kernelized version of both of our algorithms. The resulting algorithms do well in uncovering multiple clusterings, and are much improved over existing methods. We evaluate our methods on two real-world datasets—a music dataset from the text-mining domain, and a portrait dataset from the computer-vision domain. Our methods achieve a substantially higher accuracy than existing factorial learning as well as traditional clustering algorithms. Copyright © 2008 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 1: 000-000, 2008

[1]  Geoffrey E. Hinton,et al.  Autoencoders, Minimum Description Length and Helmholtz Free Energy , 1993, NIPS.

[2]  F. Samaniego,et al.  Maximum Likelihood Estimation for a Class of Multinomial Distributions Arising in Reliability , 1981 .

[3]  Zoubin Ghahramani,et al.  Factorial Learning and the EM Algorithm , 1994, NIPS.

[4]  Stanley L. Sclove,et al.  Estimating the Parameters of a Convolution , 1969 .

[5]  David J. Kriegman,et al.  From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Ash A. Alizadeh,et al.  'Gene shaving' as a method for identifying distinct sets of genes with similar expression patterns , 2000, Genome Biology.

[7]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[8]  Raymond J. Mooney,et al.  Integrating constraints and metric learning in semi-supervised clustering , 2004, ICML.

[9]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[10]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[11]  Shivakumar Vaithyanathan,et al.  Clustering with Model-level Constraints , 2005, SDM.

[12]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[13]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[14]  Gene H. Golub,et al.  Matrix computations , 1983 .

[15]  S. S. Ravi,et al.  Efficient incremental constrained clustering , 2007, KDD '07.

[16]  Claire Cardie,et al.  Clustering with Instance-Level Constraints , 2000, AAAI/IAAI.

[17]  James Bailey,et al.  COALA: A Novel Approach for the Extraction of an Alternate Clustering of High Quality and High Dissimilarity , 2006, Sixth International Conference on Data Mining (ICDM'06).

[18]  Joshua B. Tenenbaum,et al.  Separating Style and Content with Bilinear Models , 2000, Neural Computation.

[19]  Thomas Hofmann,et al.  Non-redundant data clustering , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[20]  E. M. L. Beale,et al.  Nonlinear Programming: A Unified Approach. , 1970 .

[21]  David G. Stork,et al.  Pattern Classification , 1973 .

[22]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[23]  Richard S. Zemel,et al.  Learning Parts-Based Representations of Data , 2006, J. Mach. Learn. Res..

[24]  Claire Cardie,et al.  Constrained K-means Clustering with Background Knowledge , 2001, ICML.

[25]  William R. Gaffey,et al.  A Consistent Estimator of a Component of a Convolution , 1959 .