A Model-Based Approach to Rounding in Spectral Clustering

In spectral clustering, one defines a similarity matrix for a collection of data points, transforms the matrix to get the Laplacian matrix, finds the eigenvectors of the Laplacian matrix, and obtains a partition of the data using the leading eigenvectors. The last step is sometimes referred to as rounding, where one needs to decide how many leading eigenvectors to use, to determine the number of clusters, and to partition the data points. In this paper, we propose a novel method for rounding. The method differs from previous methods in three ways. First, we relax the assumption that the number of clusters equals the number of eigenvectors used. Second, when deciding the number of leading eigenvectors to use, we not only rely on information contained in the leading eigenvectors themselves, but also use subsequent eigenvectors. Third, our method is model-based and solves all the three subproblems of rounding using a class of graphical models called latent tree models. We evaluate our method on both synthetic and real-world data. The results show that our method works correctly in the ideal case where between-clusters similarity is 0, and degrades gracefully as one moves away from the ideal case.

[1]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[2]  Alessandro Verri,et al.  Spectral clustering with more than K eigenvectors , 2011, Neurocomputing.

[3]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[4]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.

[5]  Stan Lipovetsky,et al.  Latent Variable Models and Factor Analysis , 2001, Technometrics.

[6]  Tao Chen,et al.  Latent Tree Models and Approximate Inference in Bayesian Networks , 2008, AAAI.

[7]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[8]  Nevin Lianwen Zhang,et al.  Hierarchical latent class models for cluster analysis , 2002, J. Mach. Learn. Res..

[9]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[10]  Maoguo Gong,et al.  Spectral clustering with eigenvector selection based on entropy ranking , 2010, Neurocomputing.

[11]  Christopher D. Manning,et al.  Spectral Chinese Restaurant Processes: Nonparametric Clustering Based on Similarities , 2011, AISTATS.

[12]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[13]  Michael I. Jordan,et al.  Learning Spectral Clustering, With Application To Speech Separation , 2006, J. Mach. Learn. Res..

[14]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[15]  Charless C. Fowlkes,et al.  Contour Detection and Hierarchical Image Segmentation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  M. Meilă Comparing clusterings---an information based distance , 2007 .

[17]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Tao Chen,et al.  Model-based multidimensional clustering of categorical data , 2012, Artif. Intell..

[19]  Multiway Spectral Clustering: A Margin-based Perspective , 2008, 1102.3768.

[20]  Shaogang Gong,et al.  Spectral clustering with eigenvector selection , 2008, Pattern Recognit..