A Unified Framework for Representation-Based Subspace Clustering of Out-of-Sample and Large-Scale Data

Under the framework of spectral clustering, the key of subspace clustering is building a similarity graph, which describes the neighborhood relations among data points. Some recent works build the graph using sparse, low-rank, and ℓ2-norm-based representation, and have achieved the state-of-the-art performance. However, these methods have suffered from the following two limitations. First, the time complexities of these methods are at least proportional to the cube of the data size, which make those methods inefficient for solving the large-scale problems. Second, they cannot cope with the out-of-sample data that are not used to construct the similarity graph. To cluster each out-of-sample datum, the methods have to recalculate the similarity graph and the cluster membership of the whole data set. In this paper, we propose a unified framework that makes the representation-based subspace clustering algorithms feasible to cluster both the out-of-sample and the large-scale data. Under our framework, the large-scale problem is tackled by converting it as the out-of-sample problem in the manner of sampling, clustering, coding, and classifying. Furthermore, we give an estimation for the error bounds by treating each subspace as a point in a hyperspace. Extensive experimental results on various benchmark data sets show that our methods outperform several recently proposed scalable methods in clustering a large-scale data set.

[1]  Tarald O. Kvålseth,et al.  Entropy and Correlation: Some Comments , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[2]  Zhixun Su,et al.  Fixed-rank representation for unsupervised visual learning , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Thomas S. Huang,et al.  A Max-Margin Perspective on Sparse Representation-Based Classification , 2013, 2013 IEEE International Conference on Computer Vision.

[4]  Ling Huang,et al.  Fast approximate spectral clustering , 2009, KDD.

[5]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[6]  Michael K. Ng,et al.  Dictionary Learning-Based Subspace Structure Identification in Spectral Clustering , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[7]  Yongsheng Ding,et al.  Low-Rank Kernel Matrix Factorization for Large-Scale Evolutionary Clustering , 2012, IEEE Transactions on Knowledge and Data Engineering.

[8]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[9]  Zhang Yi,et al.  Scalable Sparse Subspace Clustering , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Feiping Nie,et al.  Discriminative Embedded Clustering: A Framework for Grouping High-Dimensional Data , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[11]  Yuxiao Hu,et al.  Face recognition using Laplacianfaces , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  René Vidal,et al.  A closed form solution to robust subspace estimation and clustering , 2011, CVPR 2011.

[13]  Fevzi Alimo Methods of Combining Multiple Classiiers Based on Diierent Representations for Pen-based Handwritten Digit Recognition , 1996 .

[14]  Hans-Peter Kriegel,et al.  Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering , 2009, TKDD.

[15]  Dong Xu,et al.  Human Gait Recognition Using Patch Distribution Feature and Locality-Constrained Group Sparse Representation , 2012, IEEE Transactions on Image Processing.

[16]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[17]  Daniel D. Lee,et al.  Grassmann discriminant analysis: a unifying view on subspace-based learning , 2008, ICML '08.

[18]  Shuicheng Yan,et al.  Latent Low-Rank Representation for subspace segmentation and feature extraction , 2011, 2011 International Conference on Computer Vision.

[19]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[20]  Mohammed Bennamoun,et al.  Linear Regression for Face Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Zhixun Su,et al.  Linearized Alternating Direction Method with Adaptive Penalty for Low-Rank Representation , 2011, NIPS.

[22]  Lei Zhang,et al.  Sparse representation or collaborative representation: Which helps face recognition? , 2011, 2011 International Conference on Computer Vision.

[23]  Martin J. Wainwright,et al.  Estimation of (near) low-rank matrices with noise and high-dimensional scaling , 2009, ICML.

[24]  A. Martínez,et al.  The AR face databasae , 1998 .

[25]  Takeo Kanade,et al.  Multi-PIE , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[26]  Hans-Peter Kriegel,et al.  Subspace clustering , 2012, WIREs Data Mining Knowl. Discov..

[27]  Edward Y. Chang,et al.  Parallel Spectral Clustering in Distributed Systems , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Rong Jin,et al.  Approximate kernel k-means: solution to large scale kernel clustering , 2011, KDD.

[29]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[30]  John Wright,et al.  Segmentation of Multivariate Mixed Data via Lossy Data Coding and Compression , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Liang-Tien Chia,et al.  Sparse Representation With Kernels , 2013, IEEE Transactions on Image Processing.

[32]  Aleix M. Martinez,et al.  The AR face database , 1998 .

[33]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2009, Found. Comput. Math..

[34]  Ben Shneiderman,et al.  Analyzing Social Media Networks with NodeXL: Insights from a Connected World , 2010 .

[35]  Francesco Masulli,et al.  A survey of kernel and spectral methods for clustering , 2008, Pattern Recognit..

[36]  B. V. K. Vijaya Kumar,et al.  Generalized Transitive Distance with Minimum Spanning Random Forest , 2015, IJCAI.

[37]  Jitendra Malik,et al.  Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Ulrike von Luxburg,et al.  Limits of Spectral Clustering , 2004, NIPS.

[39]  Mohamed-Ali Belabbas,et al.  Spectral methods in machine learning and new strategies for very large datasets , 2009, Proceedings of the National Academy of Sciences.

[40]  Zhang Yi,et al.  Constructing L2-Graph For Subspace Learning and Segmentation , 2012, ArXiv.

[41]  David Zhang,et al.  Relaxed collaborative representation for pattern classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Tal Hassner,et al.  Multiple One-Shots for Utilizing Class Label Information , 2009, BMVC.

[43]  Yong Yu,et al.  Robust Recovery of Subspace Structures by Low-Rank Representation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  René Vidal,et al.  Sparse Subspace Clustering: Algorithm, Theory, and Applications , 2012, IEEE transactions on pattern analysis and machine intelligence.

[45]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[46]  Denis J. Dean,et al.  Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables , 1999 .

[47]  George Karypis,et al.  Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering , 2004, Machine Learning.

[48]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Zhang Yi,et al.  Robust Subspace Clustering via Thresholding Ridge Regression , 2015, AAAI.

[50]  Ehsan Elhamifar,et al.  Sparse subspace clustering , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Kotagiri Ramamohanarao,et al.  Approximate pairwise clustering for large data sets via sampling plus extension , 2011, Pattern Recognit..

[52]  YanShuicheng,et al.  Learning with l1-graph for image analysis , 2010 .

[53]  Ming-Syan Chen,et al.  Density Conscious Subspace Clustering for High-Dimensional Data , 2010, IEEE Transactions on Knowledge and Data Engineering.

[54]  Xin Zhang,et al.  Fast Low-Rank Subspace Segmentation , 2014, IEEE Transactions on Knowledge and Data Engineering.

[55]  Shuicheng Yan,et al.  Learning With $\ell ^{1}$-Graph for Image Analysis , 2010, IEEE Transactions on Image Processing.

[56]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[57]  Allen Y. Yang,et al.  Fast ℓ1-minimization algorithms and an application in robust face recognition: A review , 2010, 2010 IEEE International Conference on Image Processing.

[58]  Kadim Tasdemir,et al.  Vector quantization based approximate spectral clustering of large datasets , 2012, Pattern Recognit..

[59]  Ethem Alpaydin,et al.  Combining multiple representations and classifiers for pen-based handwritten digit recognition , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[60]  Dong Xu,et al.  FaLRR: A fast low rank representation solver , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  David J. Kriegman,et al.  From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[62]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[63]  M. R. Osborne,et al.  A new approach to variable selection in least squares problems , 2000 .

[64]  Dong Xu,et al.  Weighted Block-Sparse Low Rank Representation for Face Clustering in Videos , 2014, ECCV.

[65]  Shuicheng Yan,et al.  Neighborhood preserving embedding , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[66]  Zhang Yi,et al.  Learning locality-constrained collaborative representation for robust face recognition , 2012, Pattern Recognit..

[67]  Ivor W. Tsang,et al.  Spectral Embedded Clustering: A Framework for In-Sample and Out-of-Sample Spectral Clustering , 2011, IEEE Transactions on Neural Networks.

[68]  Shuicheng Yan,et al.  Robust and Efficient Subspace Segmentation via Least Squares Regression , 2012, ECCV.

[69]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[70]  Xinlei Chen,et al.  Large Scale Spectral Clustering with Landmark-Based Representation , 2011, AAAI.

[71]  Shuicheng Yan,et al.  Active Subspace: Toward Scalable Low-Rank Learning , 2012, Neural Computation.

[72]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[73]  Jiawei Han,et al.  Document clustering using locality preserving indexing , 2005, IEEE Transactions on Knowledge and Data Engineering.

[74]  Liang-Tien Chia,et al.  Laplacian Sparse Coding, Hypergraph Laplacian Sparse Coding, and Applications , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[75]  Yong Yu,et al.  Robust Subspace Segmentation by Low-Rank Representation , 2010, ICML.

[76]  René Vidal,et al.  Motion segmentation via robust subspace separation in the presence of outlying, incomplete, or corrupted trajectories , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.