Large Scale Spectral Clustering Via Landmark-Based Sparse Representation

Spectral clustering is one of the most popular clustering approaches. However, it is not a trivial task to apply spectral clustering to large-scale problems due to its computational complexity of O(n3) , where n is the number of samples. Recently, many approaches have been proposed to accelerate the spectral clustering. Unfortunately, these methods usually sacrifice quite a lot information of the original data, thus result in a degradation of performance. In this paper, we propose a novel approach, called landmark-based spectral clustering, for large-scale clustering problems. Specifically, we select p (≪ n) representative data points as the landmarks and represent the original data points as sparse linear combinations of these landmarks. The spectral embedding of the data can then be efficiently computed with the landmark-based representation. The proposed algorithm scales linearly with the problem size. Extensive experiments show the effectiveness and efficiency of our approach comparing to the state-of-the-art methods.

[1]  Honglak Lee,et al.  Sparse deep belief net model for visual area V2 , 2007, NIPS.

[2]  Ling Huang,et al.  Fast approximate spectral clustering , 2009, KDD.

[3]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[4]  Ivor W. Tsang,et al.  Improved Nyström low-rank approximation and error analysis , 2008, ICML '08.

[5]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[6]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[7]  Chris H. Q. Ding,et al.  Bipartite graph partitioning and data clustering , 2001, CIKM '01.

[8]  Edward Y. Chang,et al.  Parallel Spectral Clustering in Distributed Systems , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[10]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[11]  Faculteit Elektrotechniek Sparse principal component analysis Ijle principale componenten analyse , 2010 .

[12]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[13]  Wolfgang Härdle,et al.  Applied Nonparametric Regression , 1991 .

[14]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Yu Hen Hu,et al.  Vehicle classification in distributed sensor networks , 2004, J. Parallel Distributed Comput..

[16]  Minoru Sasaki,et al.  Spectral Clustering for a Large Data Set by Reducing the Similarity Matrix Size , 2008, LREC.

[17]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[18]  Jiawei Han,et al.  Document clustering using locality preserving indexing , 2005, IEEE Transactions on Knowledge and Data Engineering.

[19]  Wei Liu,et al.  Large Graph Construction for Scalable Semi-Supervised Learning , 2010, ICML.

[20]  Michael Elad,et al.  Sparse Representation for Color Image Restoration , 2008, IEEE Transactions on Image Processing.

[21]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[22]  Zhaohui Wu,et al.  Constrained Concept Factorization for Image Representation , 2014, IEEE Transactions on Cybernetics.

[23]  Hui Xiong,et al.  Understanding and Enhancement of Internal Clustering Validation Measures , 2013, IEEE Transactions on Cybernetics.

[24]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[25]  I. Daubechies,et al.  An iterative thresholding algorithm for linear inverse problems with a sparsity constraint , 2003, math/0307152.

[26]  Francesco Masulli,et al.  A survey of kernel and spectral methods for clustering , 2008, Pattern Recognit..

[27]  Chun Chen,et al.  Relational Multimanifold Coclustering , 2013, IEEE Transactions on Cybernetics.

[28]  Tie-Yan Liu,et al.  Fast Spectral Clustering of Data Using Sequential Matrix Compression , 2006, ECML.

[29]  W. Härdle,et al.  Applied Nonparametric Regression , 1991 .

[30]  Yann LeCun,et al.  Learning Fast Approximations of Sparse Coding , 2010, ICML.

[31]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[32]  Atsushi Imiya,et al.  Fast Spectral Clustering with Random Projection and Sampling , 2009, MLDM.

[33]  Jitendra Malik,et al.  Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[35]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[36]  Michael Elad,et al.  Image Denoising Via Learned Dictionaries and Sparse representation , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[37]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[38]  Bin Wang,et al.  A Fast and Robust Level Set Method for Image Segmentation Using Fuzzy Clustering and Lattice Boltzmann Method , 2013, IEEE Transactions on Cybernetics.

[39]  Michael Elad,et al.  Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[40]  James T. Kwok,et al.  Making Large-Scale Nyström Approximation Possible , 2010, ICML.

[41]  Tao Qin,et al.  Fast Large-Scale Spectral Clustering by Sequential Shrinkage Optimization , 2007, ECIR.