A divide-and-conquer framework for large-scale subspace clustering

Given data that lies in a union of low-dimensional subspaces, the problem of subspace clustering aims to learn — in an unsupervised manner — the membership of the data to their respective subspaces. State-of-the-art subspace clustering methods typically adopt a two-step procedure. In the first step, an affinity measure among data points is constructed, usually by exploiting some form of data self-representation. In the second step, spectral clustering is applied to the affinity measure to find the membership of the data to their respective subspaces. While such methods are broadly applicable to mid-size datasets with 10,000 data points in 10,000 variables, they cannot be directly applied to large-scale datasets. This paper proposes a divide-and-conquer framework for large-scale subspace clustering. The data is first divided into chunks and subspace clustering is applied to each chunk. After removing potential outliers from each cluster, a new cross-representation measure for the similarity between subspaces is used to merge clusters from different chunks that correspond to the same subspace. A self-representation method is then used to assign outliers to clusters. We evaluate the proposed strategy on synthetic large-scale dataset with 1,000,000 data points, as well as on the MNIST database, which contains 70,000 images of handwritten digits. The numerical results highlight the scalability of our approach.

[1]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  René Vidal,et al.  Geometric Conditions for Subspace-Sparse Recovery , 2015, ICML.

[3]  Emmanuel J. Candès,et al.  A Geometric Analysis of Subspace Clustering with Outliers , 2011, ArXiv.

[4]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[5]  Joel A. Tropp,et al.  Robust Computation of Linear Models by Convex Relaxation , 2012, Foundations of Computational Mathematics.

[6]  Cheng Soon Ong,et al.  Multivariate spearman's ρ for aggregating ranks using copulas , 2016 .

[7]  Aswin C. Sankaranarayanan,et al.  Greedy feature selection for subspace clustering , 2013, J. Mach. Learn. Res..

[8]  René Vidal,et al.  Dual Principal Component Pursuit , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[9]  Ping Li,et al.  Online Low-Rank Subspace Clustering by Basis Dictionary Pursuit , 2015, ICML.

[10]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[11]  Aarti Singh,et al.  A Deterministic Analysis of Noisy Sparse Subspace Clustering for Dimensionality-reduced Data , 2015, ICML.

[12]  Constantine Caramanis,et al.  Robust PCA via Outlier Pursuit , 2010, IEEE Transactions on Information Theory.

[13]  Daniel P. Robinson,et al.  Scalable Sparse Subspace Clustering by Orthogonal Matching Pursuit , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  S. Mallat,et al.  Invariant Scattering Convolution Networks , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Lei Zhang,et al.  Sparse representation or collaborative representation: Which helps face recognition? , 2011, 2011 International Conference on Computer Vision.

[16]  Zhang Yi,et al.  Scalable Sparse Subspace Clustering , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  R. Vidal,et al.  Sparse Subspace Clustering: Algorithm, Theory, and Applications. , 2013, IEEE transactions on pattern analysis and machine intelligence.

[18]  Daniel P. Robinson,et al.  Oracle Based Active Set Algorithm for Scalable Elastic Net Subspace Clustering , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Nebojsa Jojic,et al.  -Sparse Subspace Clustering , 2016 .

[20]  René Vidal,et al.  Sparse Subspace Clustering: Algorithm, Theory, and Applications , 2012, IEEE transactions on pattern analysis and machine intelligence.

[21]  René Vidal,et al.  Sparse subspace clustering , 2009, CVPR.

[22]  Yong Yu,et al.  Robust Subspace Segmentation by Low-Rank Representation , 2010, ICML.

[23]  Huan Xu,et al.  Noisy Sparse Subspace Clustering , 2013, J. Mach. Learn. Res..

[24]  Dong Xu,et al.  FaLRR: A fast low rank representation solver , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Zhixun Su,et al.  Linearized Alternating Direction Method with Adaptive Penalty for Low-Rank Representation , 2011, NIPS.

[26]  Michael Elad,et al.  Linear-Time Subspace Clustering via Bipartite Graph Modeling , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[27]  Hans-Peter Kriegel,et al.  Subspace clustering , 2012, WIREs Data Mining Knowl. Discov..