Distributed sketched subspace clustering for large-scale datasets

Subspace clustering has been a successful tool for unsupervised classification of high-dimensional and generally non linearly separable data. However, state-of-the-art subspace clustering algorithms do not scale well as the number of data increases. The present paper puts forth a distributed subspace clustering scheme for high-volume data based on random projections. Additionally, the method can cope with corrupted data. Performance of the novel scheme is assessed via numerical tests, and is compared with state-of-the-art subspace clustering methods.

[1]  Yong Yu,et al.  Robust Subspace Segmentation by Low-Rank Representation , 2010, ICML.

[2]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[3]  R. Vidal A TUTORIAL ON SUBSPACE CLUSTERING , 2010 .

[4]  Julien Mairal,et al.  Network Flow Algorithms for Structured Sparsity , 2010, NIPS.

[5]  Daniel P. Robinson,et al.  Sparse Subspace Clustering with Missing Entries , 2015, ICML.

[6]  Huan Liu,et al.  Subspace clustering for high dimensional data: a review , 2004, SKDD.

[7]  Nabil H. Mustafa,et al.  k-means projective clustering , 2004, PODS.

[8]  Christopher M. Bishop,et al.  Mixtures of Probabilistic Principal Component Analyzers , 1999, Neural Computation.

[9]  Yingyu Liang,et al.  Distributed k-Means and k-Median Clustering on General Topologies , 2013, NIPS 2013.

[10]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[11]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[12]  Georgios B. Giannakis,et al.  A randomized approach to large-scale subspace clustering , 2016, 2016 50th Asilomar Conference on Signals, Systems and Computers.

[13]  Georgios B. Giannakis,et al.  Distributed Clustering Using Wireless Sensor Networks , 2011, IEEE Journal of Selected Topics in Signal Processing.

[14]  Junbin Gao,et al.  Tensor LRR based subspace clustering , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[15]  Daniel P. Robinson,et al.  Oracle Based Active Set Algorithm for Scalable Elastic Net Subspace Clustering , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Shuicheng Yan,et al.  Robust and Efficient Subspace Segmentation via Least Squares Regression , 2012, ECCV.

[17]  Tamás Sarlós,et al.  Improved Approximation Algorithms for Large Matrices via Random Projections , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[18]  S. Mallat,et al.  Invariant Scattering Convolution Networks , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[20]  Longquan Yong,et al.  Low-Rank Representation for Incomplete Data , 2014 .

[21]  Helmut Bölcskei,et al.  Subspace clustering of dimensionality-reduced data , 2014, 2014 IEEE International Symposium on Information Theory.

[22]  Georgios B. Giannakis,et al.  Efficient subspace clustering of large-scale data streams with misses , 2016, 2016 Annual Conference on Information Science and Systems (CISS).

[23]  Daniel P. Robinson,et al.  Scalable Sparse Subspace Clustering by Orthogonal Matching Pursuit , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Xindong Wu,et al.  Graph-Based Learning via Auto-Grouped Sparse Regularization and Kernelized Extension , 2015, IEEE Transactions on Knowledge and Data Engineering.

[25]  Ping Li,et al.  Online Low-Rank Subspace Clustering by Basis Dictionary Pursuit , 2015, ICML.

[26]  Takeo Kanade,et al.  A Multibody Factorization Method for Independently Moving Objects , 1998, International Journal of Computer Vision.

[27]  Dimitris Achlioptas,et al.  Database-friendly random projections , 2001, PODS.

[28]  Aswin C. Sankaranarayanan,et al.  Greedy feature selection for subspace clustering , 2013, J. Mach. Learn. Res..

[29]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[30]  David P. Woodruff Sketching as a Tool for Numerical Linear Algebra , 2014, Found. Trends Theor. Comput. Sci..

[31]  Edward Y. Chang,et al.  Parallel Spectral Clustering in Distributed Systems , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  R. Vidal,et al.  Sparse Subspace Clustering: Algorithm, Theory, and Applications. , 2013, IEEE transactions on pattern analysis and machine intelligence.

[33]  Georgios B. Giannakis,et al.  Sketched Subspace Clustering , 2017, IEEE Transactions on Signal Processing.

[34]  Constantine Kotropoulos,et al.  Elastic Net subspace clustering applied to pop/rock music structure analysis , 2014, Pattern Recognit. Lett..

[35]  Georgios B. Giannakis,et al.  PARAFAC-based multilinear subspace clustering for tensor data , 2016, 2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[36]  Seung-Jun Kim,et al.  Online robust subspace clustering for analyzing incomplete synchrophasor measurements , 2016, 2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[37]  René Vidal,et al.  Online Clustering of Moving Hyperplanes , 2006, NIPS.

[38]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[39]  David J. Kriegman,et al.  From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[40]  Gilad Lerman,et al.  Median K-Flats for hybrid linear modeling with many outliers , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[41]  Christos Boutsidis,et al.  Randomized Dimensionality Reduction for $k$ -Means Clustering , 2011, IEEE Transactions on Information Theory.

[42]  S. Shankar Sastry,et al.  Generalized principal component analysis (GPCA) , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.