Spectral Clustering with Automatic Cluster-Number Identification via Finding Sparse Eigenvectors

Spectral clustering is an empirically successful approach to separating a dataset into some groups with possibly complex shapes based on pairwise affinity. Identifying the number of clusters automatically is still an open issue, although many heuristics have been proposed. In this paper, imposing sparsity on the eigenvectors of graph Laplacian is proposed to attain reasonable approximations of the so-called cluster-indicator-vectors, from which the clusters as well as the cluster number are identified. The proposed algorithm enjoys low computational complexity as it only computes a relevant subset of eigenvectors. It also enjoys better clustering quality than the existing methods, as shown by simulations using nine real datasets.

[1]  Lester W. Mackey,et al.  Deflation Methods for Sparse PCA , 2008, NIPS.

[2]  Prabhu Babu,et al.  Sparse Generalized Eigenvalue Problem Via Smooth Optimization , 2014, IEEE Transactions on Signal Processing.

[3]  Arthur Zimek,et al.  A framework for semi-supervised and unsupervised optimal extraction of clusters from hierarchies , 2013, Data Mining and Knowledge Discovery.

[4]  James Bailey,et al.  Information theoretic measures for clusterings comparison: is a correction for chance necessary? , 2009, ICML '09.

[5]  David P. Hofmeyr Improving Spectral Clustering Using the Asymptotic Value of the Normalized Cut , 2017, Journal of Computational and Graphical Statistics.

[6]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[7]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[8]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[10]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[11]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[12]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[13]  E. Forgy,et al.  Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[14]  Shaogang Gong,et al.  Spectral clustering with eigenvector selection , 2008, Pattern Recognit..

[15]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[16]  Yurii Nesterov,et al.  Generalized Power Method for Sparse Principal Component Analysis , 2008, J. Mach. Learn. Res..

[17]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[18]  Prabhu Babu,et al.  Orthogonal Sparse PCA and Covariance Estimation via Procrustes Reformulation , 2016, IEEE Transactions on Signal Processing.

[19]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.