Robust Self-Tuning Sparse Subspace Clustering

Sparse subspace clustering (SSC) is an effective approach to cluster high-dimensional data. However, how to adaptively select the number of clusters/eigenvectors for different data sets, especially when the data are corrupted by noise, is a big challenge in SSC and also an open problem in field of data mining. In this paper, considering the fact that the eigenvectors are robust to noise, we develop a self-adaptive search method to select cluster number for SSC by exploiting the cluster-separation information from eigenvectors. Our method solves the problem by identifying the cluster centers over eigenvectors. We first design a new density based metric, called centrality coefficient gap, to measure such separation information, and estimate the cluster centers by maximizing the gap. After getting the cluster centers, it is straightforward to group the remaining points into respective clusters which contain their nearest neighbors with higher density. This leads to a new clustering algorithm in which the final randomly initialized k-means stage in traditional SSC is eliminated. We theoretically verify the correctness of the proposed method on noise-free data. Extensive experiments on synthetic and real-world data corrupted by noise demonstrate the robustness and effectiveness of the proposed method comparing to the well-established competitors.

[1]  S. Shankar Sastry,et al.  Generalized principal component analysis (GPCA) , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Pietro Perona,et al.  Grouping and dimensionality reduction by locally linear embedding , 2001, NIPS.

[3]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[4]  Shih-Fu Chang,et al.  Image Retrieval: Current Techniques, Promising Directions, and Open Issues , 1999, J. Vis. Commun. Image Represent..

[5]  Qinbao Song,et al.  Automatic Clustering via Outward Statistical Testing on Density Metrics , 2016, IEEE Transactions on Knowledge and Data Engineering.

[6]  Shaogang Gong,et al.  Spectral clustering with eigenvector selection , 2008, Pattern Recognit..

[7]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[8]  Jun Wang,et al.  LRSR: Low-Rank-Sparse representation for subspace clustering , 2016, Neurocomputing.

[9]  G. Stewart,et al.  Matrix Perturbation Theory , 1990 .

[10]  Emmanuel J. Candès,et al.  A Geometric Analysis of Subspace Clustering with Outliers , 2011, ArXiv.

[11]  Silke Wagner,et al.  Comparing Clusterings - An Overview , 2007 .

[12]  René Vidal,et al.  Clustering disjoint subspaces via sparse representation , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Richard I. Hartley,et al.  Graph connectivity in sparse subspace clustering , 2011, CVPR 2011.

[14]  Sean Hughes,et al.  Clustering by Fast Search and Find of Density Peaks , 2016 .

[15]  René Vidal,et al.  Sparse subspace clustering , 2009, CVPR.

[16]  René Vidal,et al.  Low rank subspace clustering (LRSC) , 2014, Pattern Recognit. Lett..

[17]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[18]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[19]  Huan Liu,et al.  Subspace clustering for high dimensional data: a review , 2004, SKDD.

[20]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[21]  Brian C. Lovell,et al.  Discriminative Non-Linear Stationary Subspace Analysis for Video Classification , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  D. Donoho For most large underdetermined systems of equations, the minimal 𝓁1‐norm near‐solution approximates the sparsest near‐solution , 2006 .

[23]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[24]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[25]  Huan Xu,et al.  Noisy Sparse Subspace Clustering , 2013, J. Mach. Learn. Res..

[26]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.

[27]  R. Vidal,et al.  Sparse Subspace Clustering: Algorithm, Theory, and Applications. , 2013, IEEE transactions on pattern analysis and machine intelligence.

[28]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..