An Easy-to-Implement Framework of Fast Subspace Clustering For Big Data Sets

Subspace clustering has attracted much attention due to its successful application on many data mining and computer vision tasks. However, most subspace clustering algorithms suffer from the scalability and the curse of dimensionality problems. When the volume or the dimension of the datasets becomes high, these algorithms are infeasible for the high computational complexity and large memory requirement. To enable the fast implementation of subspace clustering on big datasets, this paper proposes a simple but effective subspace clustering framework called Fast Subspace Clustering (FSC), which adopts a "sampling, random projecting, clustering, and classifying" strategy. We prove that under certain conditions on the subspace and the original subspace clustering algorithm, both the time and space complexity of FSC is O(MN) for M samples in N-dimensional space. Experimental results on several real-world datasets demonstrate the effectiveness and efficiency of the proposed framework.

[1]  Yuantao Gu,et al.  SUBSPACE PRINCIPAL ANGLE PRESERVING PROPERTY OF GAUSSIAN RANDOM PROJECTION , 2018, 2018 IEEE Data Science Workshop (DSW).

[2]  Yong Yu,et al.  Robust Recovery of Subspace Structures by Low-Rank Representation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Waheed U. Bajwa,et al.  Detection Theory for Union of Subspaces , 2017, IEEE Transactions on Signal Processing.

[4]  Jun Li,et al.  Large-scale Subspace Clustering by Fast Regression Coding , 2017, IJCAI.

[5]  Xinlei Chen,et al.  Large Scale Spectral Clustering with Landmark-Based Representation , 2011, AAAI.

[6]  Emmanuel J. Candès,et al.  A Geometric Analysis of Subspace Clustering with Outliers , 2011, ArXiv.

[7]  Yuantao Gu,et al.  Unraveling the Veil of Subspace RIP Through Near-Isometry on Subspaces , 2020, IEEE Transactions on Signal Processing.

[8]  René Vidal,et al.  Sparse Subspace Clustering: Algorithm, Theory, and Applications , 2012, IEEE transactions on pattern analysis and machine intelligence.

[9]  Michael Tschannen,et al.  Noisy Subspace Clustering via Matching Pursuits , 2018, IEEE Transactions on Information Theory.

[10]  Shuai Yang,et al.  Restricted Connection Orthogonal Matching Pursuit for Sparse Subspace Clustering , 2019, IEEE Signal Processing Letters.

[11]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[12]  David J. Kriegman,et al.  Acquiring linear subspaces for face recognition under variable lighting , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Helmut Bölcskei,et al.  Dimensionality-reduced subspace clustering , 2015, ArXiv.

[14]  Helmut Bölcskei,et al.  Robust Subspace Clustering via Thresholding , 2013, IEEE Transactions on Information Theory.

[15]  Qinghua Liu,et al.  Rigorous Restricted Isometry Property of Low-Dimensional Subspaces , 2018, Applied and Computational Harmonic Analysis.

[16]  Zhang Yi,et al.  Scalable Sparse Subspace Clustering , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Yuantao Gu,et al.  A General Framework for Understanding Compressed Subspace Clustering Algorithms , 2018, IEEE Journal of Selected Topics in Signal Processing.

[18]  Yuantao Gu,et al.  Compressed Subspace Learning Based on Canonical Angle Preserving Property , 2019, ArXiv.

[19]  Daniel P. Robinson,et al.  Oracle Based Active Set Algorithm for Scalable Elastic Net Subspace Clustering , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Shuicheng Yan,et al.  Robust and Efficient Subspace Segmentation via Least Squares Regression , 2012, ECCV.

[21]  Tengyao Wang,et al.  A useful variant of the Davis--Kahan theorem for statisticians , 2014, 1405.0680.

[22]  Huan Xu,et al.  Noisy Sparse Subspace Clustering , 2013, J. Mach. Learn. Res..

[23]  Ling Huang,et al.  Fast approximate spectral clustering , 2009, KDD.

[24]  Daniel P. Robinson,et al.  Scalable Exemplar-based Subspace Clustering on Class-Imbalanced Data , 2018, European Conference on Computer Vision.

[25]  Yuantao Gu,et al.  Restricted Isometry Property of Gaussian Random Projection for Finite Set of Subspaces , 2017, IEEE Transactions on Signal Processing.

[26]  Yuantao Gu,et al.  Compressed subspace clustering: A case study , 2014, 2014 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[27]  Daniel P. Robinson,et al.  Scalable Sparse Subspace Clustering by Orthogonal Matching Pursuit , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Calton Pu,et al.  Introducing the Webb Spam Corpus: Using Email Spam to Identify Web Spam Automatically , 2006, CEAS.

[29]  Shou-De Lin,et al.  Feature Engineering and Classifier Ensemble for KDD Cup 2010 , 2010, KDD 2010.

[30]  Zhang Yi,et al.  A Unified Framework for Representation-Based Subspace Clustering of Out-of-Sample and Large-Scale Data , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[31]  Yuantao Gu,et al.  Active Orthogonal Matching Pursuit for Sparse Subspace Clustering , 2018, IEEE Signal Processing Letters.

[32]  Georgios B. Giannakis,et al.  Sketched Subspace Clustering , 2017, IEEE Transactions on Signal Processing.