A Theoretical Analysis of Noisy Sparse Subspace Clustering on Dimensionality-Reduced Data

Subspace clustering is the problem of partitioning unlabeled data points into a number of clusters so that data points within one cluster lie approximately on a low-dimensional linear subspace. In many practical scenarios, the dimensionality of data points to be clustered is compressed due to the constraints of measurement, computation, or privacy. In this paper, we study the theoretical properties of a popular subspace clustering algorithm named sparse subspace clustering (SSC) and establish formal success conditions of SSC on dimensionality-reduced data. Our analysis applies to the most general fully deterministic model, where both underlying subspaces and data points within each subspace are deterministically positioned, and also a wide range of dimensionality reduction techniques (e.g., Gaussian random projection, uniform subsampling, and sketching) that fall into a subspace embedding framework. Finally, we apply our analysis to a differentially private SSC algorithm and established both privacy and utility guarantees of the proposed method.

[1]  George T. Duncan,et al.  Enhancing Access to Microdata while Protecting Confidentiality: Prospects for the Future , 1991 .

[2]  Dan Feldman,et al.  Turning big data into tiny data: Constant-size coresets for k-means, PCA and projective clustering , 2013, SODA.

[3]  S. Shankar Sastry,et al.  Generalized principal component analysis (GPCA) , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Aarti Singh,et al.  Graph Connectivity in Noisy Sparse Subspace Clustering , 2015, AISTATS.

[5]  Dimitri P. Bertsekas,et al.  Constrained Optimization and Lagrange Multiplier Methods , 1982 .

[6]  Paul S. Bradley,et al.  k-Plane Clustering , 2000, J. Glob. Optim..

[7]  Nina Mishra,et al.  Privacy via the Johnson-Lindenstrauss Transform , 2012, J. Priv. Confidentiality.

[8]  Stratis Ioannidis,et al.  Guess Who Rated This Movie: Identifying Users Through Subspace Clustering , 2012, UAI.

[9]  Bernard Chazelle,et al.  The Fast Johnson--Lindenstrauss Transform and Approximate Nearest Neighbors , 2009, SIAM J. Comput..

[10]  John Wright,et al.  Segmentation of Multivariate Mixed Data via Lossy Data Coding and Compression , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Moses Charikar,et al.  Finding frequent items in data streams , 2004, Theor. Comput. Sci..

[12]  Hans-Peter Kriegel,et al.  Subspace clustering , 2012, WIREs Data Mining Knowl. Discov..

[13]  R. Vidal,et al.  Sparse Subspace Clustering: Algorithm, Theory, and Applications. , 2013, IEEE transactions on pattern analysis and machine intelligence.

[14]  Giovanni Montana,et al.  Subspace clustering of high-dimensional data: a predictive approach , 2012, Data Mining and Knowledge Discovery.

[15]  Robert D. Nowak,et al.  High-dimensional Matched Subspace Detection when data are missing , 2010, 2010 IEEE International Symposium on Information Theory.

[16]  Benjamin Recht,et al.  A Simpler Approach to Matrix Completion , 2009, J. Mach. Learn. Res..

[17]  Takeo Kanade,et al.  A Multibody Factorization Method for Independently Moving Objects , 1998, International Journal of Computer Vision.

[18]  Aarti Singh,et al.  A Deterministic Analysis of Noisy Sparse Subspace Clustering for Dimensionality-reduced Data , 2015, ICML.

[19]  Huan Xu,et al.  Noisy Sparse Subspace Clustering , 2013, J. Mach. Learn. Res..

[20]  Yudong Chen,et al.  Clustering Partially Observed Graphs via Convex Optimization , 2011, ICML.

[21]  Huan Xu,et al.  Provable Subspace Clustering: When LRR Meets SSC , 2013, IEEE Transactions on Information Theory.

[22]  Ronen Basri,et al.  Lambertian Reflectance and Linear Subspaces , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[24]  P. Massart,et al.  Adaptive estimation of a quadratic functional by model selection , 2000 .

[25]  Josep Domingo-Ferrer,et al.  Statistical Disclosure Control , 2012 .

[26]  David P. Woodruff,et al.  Low rank approximation and regression in input sparsity time , 2013, STOC '13.

[27]  Akshay Krishnamurthy,et al.  On the Power of Adaptivity in Matrix Completion and Approximation , 2014, ArXiv.

[28]  Larry A. Wasserman,et al.  Compressed and Privacy-Sensitive Sparse Regression , 2009, IEEE Transactions on Information Theory.

[29]  Ton de Waal,et al.  Statistical Disclosure Control in Practice , 1996 .

[30]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[31]  Helmut Bölcskei,et al.  Robust Subspace Clustering via Thresholding , 2013, IEEE Transactions on Information Theory.

[32]  Michael W. Mahoney,et al.  Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression , 2012, STOC '13.

[33]  Emmanuel J. Candès,et al.  Robust Subspace Clustering , 2013, ArXiv.

[34]  J. Tropp User-Friendly Tools for Random Matrices: An Introduction , 2012 .

[35]  Venu Govindaraju,et al.  Dimensionality Reduction with Subspace Structure Preservation , 2014, NIPS.

[36]  Robert D. Nowak,et al.  High-Rank Matrix Completion , 2012, AISTATS.

[37]  R. Vidal A TUTORIAL ON SUBSPACE CLUSTERING , 2010 .

[38]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[39]  Stephen Becker,et al.  Quantum state tomography via compressed sensing. , 2009, Physical review letters.

[40]  Constantine Caramanis,et al.  Greedy Subspace Clustering , 2014, NIPS.

[41]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[42]  René Vidal,et al.  A Benchmark for the Comparison of 3-D Motion Segmentation Algorithms , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Emmanuel J. Candès,et al.  A Geometric Analysis of Subspace Clustering with Outliers , 2011, ArXiv.

[44]  Zhang Yi,et al.  Robust Subspace Clustering via Thresholding Ridge Regression , 2015, AAAI.

[45]  Marc Pollefeys,et al.  A General Framework for Motion Segmentation: Independent, Articulated, Rigid, Non-rigid, Degenerate and Non-degenerate , 2006, ECCV.

[46]  Helmut Bölcskei,et al.  Subspace clustering of dimensionality-reduced data , 2014, 2014 IEEE International Symposium on Information Theory.

[47]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[48]  Yong Yu,et al.  Robust Recovery of Subspace Structures by Low-Rank Representation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Helmut Bölcskei,et al.  Dimensionality-reduced subspace clustering , 2015, ArXiv.

[50]  David P. Woodruff,et al.  Subspace Embeddings for the Polynomial Kernel , 2014, NIPS.

[51]  Aarti Singh,et al.  Differentially private subspace clustering , 2015, NIPS.

[52]  P. Tseng Nearest q-Flat to m Points , 2000 .