Multi-view clustering using mixture models in subspace projections

Detecting multiple clustering solutions is an emerging research field. While data is often multi-faceted in its very nature, traditional clustering methods are restricted to find just a single grouping. To overcome this limitation, methods aiming at the detection of alternative and multiple clustering solutions have been proposed. In this work, we present a Bayesian framework to tackle the problem of multi-view clustering. We provide multiple generalizations of the data by using multiple mixture models. Each mixture describes a specific view on the data by using a mixture of Beta distributions in subspace projections. Since a mixture summarizes the clusters located in similar subspace projections, each view highlights specific aspects of the data. In addition, our model handles overlapping views, where the mixture components compete against each other in the data generation process. For efficiently learning the distributions, we propose the algorithm MVGen that exploits the ICM principle and uses Bayesian model selection to trade-off the cluster model's complexity against its goodness of fit. With experiments on various real-world data sets, we demonstrate the high potential of MVGen to detect multiple, overlapping clustering views in subspace projections of the data.

[1]  Qiang Fu,et al.  Multiplicative Mixture Models for Overlapping Clustering , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[2]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[3]  Hui Xiong,et al.  Adapting the right measures for K-means clustering , 2009, KDD.

[4]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[5]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[6]  R. Beckman,et al.  Maximum likelihood estimation for the beta distribution , 1978 .

[7]  Inderjit S. Dhillon,et al.  Simultaneous Unsupervised Learning of Disparate Clusterings , 2008, Stat. Anal. Data Min..

[8]  Emmanuel Müller,et al.  Detection of orthogonal concepts in subspaces of high dimensional data , 2009, CIKM.

[9]  Michael I. Jordan,et al.  Multiple Non-Redundant Spectral Clustering Views , 2010, ICML.

[10]  James Bailey,et al.  COALA: A Novel Approach for the Extraction of an Alternate Clustering of High Quality and High Dissimilarity , 2006, Sixth International Conference on Data Mining (ICDM'06).

[11]  J. Besag On the Statistical Analysis of Dirty Pictures , 1986 .

[12]  Joydeep Ghosh,et al.  Model-based overlapping clustering , 2005, KDD '05.

[13]  Ying Cui,et al.  Non-redundant Multi-view Clustering via Orthogonalization , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[14]  D. Anderson,et al.  Algorithms for minimization without derivatives , 1974 .

[15]  Tao Chen,et al.  Variable Selection in Model-Based Clustering: To Do or To Facilitate , 2010, ICML.

[16]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[17]  Hans-Peter Kriegel,et al.  Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering , 2009, TKDD.

[18]  James Bailey,et al.  Generation of Alternative Clusterings Using the CAMI Approach , 2010, SDM.

[19]  Sanjay Ranka,et al.  Mixture models for learning low-dimensional roles in high-dimensional data , 2010, KDD '10.

[20]  James Bailey,et al.  A hierarchical information theoretic technique for the discovery of non linear alternative clusterings , 2010, KDD.

[21]  Emmanuel Müller,et al.  Discovering Multiple Clustering Solutions: Grouping Objects in Different Views of the Data , 2010, 2012 IEEE 28th International Conference on Data Engineering.

[22]  Ian Davidson,et al.  A principled and flexible framework for finding alternative clusterings , 2009, KDD.

[23]  Ira Assent,et al.  External evaluation measures for subspace clustering , 2011, CIKM '11.