$${l_p}$$lp-Recovery of the Most Significant Subspace Among Multiple Subspaces with Outliers

We assume data sampled from a mixture of $$d$$d-dimensional linear subspaces with spherically symmetric distributions within each subspace and an additional outlier component with spherically symmetric distribution within the ambient space (for simplicity, we may assume that all distributions are uniform on their corresponding unit spheres). We also assume mixture weights for the different components. We say that one of the underlying subspaces of the model is most significant if its mixture weight is higher than the sum of the mixture weights of all other subspaces. We study the recovery of the most significant subspace by minimizing the $$l_p$$lp-averaged distances of data points from $$d$$d-dimensional subspaces of $$\mathbb R^D$$RD, where $$0 < p \in \mathbb R$$0<p∈R. Unlike other $$l_p$$lp minimization problems, this minimization is nonconvex for all $$p>0$$p>0 and thus requires different methods for its analysis. We show that if $$0<p \le 1$$0<p≤1, then for any fraction of outliers, the most significant subspace can be recovered by $$l_p$$lp minimization with overwhelming probability (which depends on the generating distribution and its parameters). We show that when adding small noise around the underlying subspaces, the most significant subspace can be nearly recovered by $$l_p$$lp minimization for any $$0<p \le 1$$0<p≤1 with an error proportional to the noise level. On the other hand, if $$p>1$$p>1 and there is more than one underlying subspace, then with overwhelming probability the most significant subspace cannot be recovered or nearly recovered. This last result does not require spherically symmetric outliers.

[1]  H. Harter The Method of Least Squares and Some Alternatives: Part III , 1975 .

[2]  S. Semmes,et al.  Singular integrals and rectifiable sets in R[n] : au-delà des graphes lipschitziens , 1991 .

[3]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[4]  Peter J. Rousseeuw,et al.  Robust Regression and Outlier Detection , 2005, Wiley Series in Probability and Statistics.

[5]  P. Rousseeuw,et al.  Wiley Series in Probability and Mathematical Statistics , 2005 .

[6]  Craig A. Tovey,et al.  Connect the dots: how many random points can a regular curve pass through? , 2005, Advances in Applied Probability.

[7]  Pablo A. Parrilo,et al.  Rank-Sparsity Incoherence for Matrix Decomposition , 2009, SIAM J. Optim..

[8]  H. Harter The Method of Least Squares and Some Alternatives. , 1972 .

[9]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[10]  Ankur Moitra,et al.  Can We Reconcile Robustness and Efficiency in Unsupervised Learning? , 2012, ArXiv.

[11]  D. Donoho For most large underdetermined systems of linear equations the minimal 𝓁1‐norm solution is also the sparsest solution , 2006 .

[12]  Sabine Van Huffel,et al.  The total least squares problem , 1993 .

[13]  A. G. Some problems in orthogonal distance and non-orthogonal distance regression , 2001 .

[14]  E. Candès,et al.  Stable signal recovery from incomplete and inaccurate measurements , 2005, math/0503066.

[15]  G. Lerman,et al.  Robust recovery of multiple subspaces by geometric l_p minimization , 2011, 1104.3770.

[16]  Gilad Lerman,et al.  Hybrid Linear Modeling via Local Best-Fit Flats , 2010, International Journal of Computer Vision.

[17]  João Paulo Costeira,et al.  Subspace segmentation with outliers: A grassmannian approach to the maximum consensus subspace , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Guangliang Chen,et al.  Spectral clustering based on local linear approximations , 2010, 1001.1323.

[19]  Gilad Lerman,et al.  Median K-Flats for hybrid linear modeling with many outliers , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[20]  Gilad Lerman,et al.  Randomized hybrid linear modeling by local best-fit flats , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21]  Nojun Kwak,et al.  Principal Component Analysis Based on L1-Norm Maximization , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[23]  Andrew Zisserman,et al.  MLESAC: A New Robust Estimator with Application to Estimating Image Geometry , 2000, Comput. Vis. Image Underst..

[24]  Feng Qi,et al.  The Best Bounds in Gautschi-Kershaw Inequalities , 2006 .

[25]  Alan Edelman,et al.  The Geometry of Algorithms with Orthogonality Constraints , 1998, SIAM J. Matrix Anal. Appl..

[26]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[27]  J. Tropp,et al.  Two proposals for robust PCA using semidefinite programming , 2010, 1012.1086.

[28]  P. Rousseeuw,et al.  Breakdown Points of Affine Equivariant Estimators of Multivariate Location and Covariance Matrices , 1991 .

[29]  H. Nyquist Least orthogonal absolute deviations , 1988 .

[30]  Emmanuel J. Candès,et al.  A Geometric Analysis of Subspace Clustering with Outliers , 2011, ArXiv.

[31]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[32]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[33]  Marc Pollefeys,et al.  A General Framework for Motion Segmentation: Independent, Articulated, Rigid, Non-rigid, Degenerate and Non-degenerate , 2006, ECCV.

[34]  T. Kanade,et al.  Robust subspace computation using L1 norm , 2003 .

[35]  Y. Wong Differential geometry of grassmann manifolds. , 1967, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Andrzej Bargiela,et al.  Orthogonal linear regression algorithm based on augmented matrix formulation , 1993, Comput. Oper. Res..

[37]  M. R. Osborne,et al.  An Analysis of the Total Approximation Problem in Separable Norms, and an Algorithm for the Total $l_1 $ Problem , 1985 .

[38]  Michael Elad,et al.  Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[39]  Constantine Caramanis,et al.  Robust PCA via Outlier Pursuit , 2010, IEEE Transactions on Information Theory.

[40]  G. Watson On the Gauss-Newton method for l1 orthogonal distance regression , 2002 .

[41]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[42]  S. Szarek The finite dimensional basis problem with an appendix on nets of Grassmann manifolds , 1983 .

[43]  Allen Y. Yang,et al.  Robust Statistical Estimation and Segmentation of Multiple Subspaces , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[44]  Y. Dodge An introduction to L 1 -norm based statistical data analysis , 1987 .

[45]  S. Szarek Metric Entropy of Homogeneous Spaces , 1997, math/9701213.

[46]  Joel A. Tropp,et al.  Robust computation of linear models, or How to find a needle in a haystack , 2012, ArXiv.

[47]  Andrew Zisserman,et al.  Robust computation and parametrization of multiple view relations , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[48]  Pertti Mattila,et al.  Geometry of sets and measures in Euclidean spaces , 1995 .

[49]  D. Donoho For most large underdetermined systems of equations, the minimal 𝓁1‐norm near‐solution approximates the sparsest near‐solution , 2006 .

[50]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[51]  Junbin Gao,et al.  Robust L1 Principal Component Analysis and Its Bayesian Variational Inference , 2008, Neural Computation.

[52]  V. Yohai,et al.  Robust Statistics: Theory and Methods , 2006 .

[53]  Chris H. Q. Ding,et al.  R1-PCA: rotational invariant L1-norm principal component analysis for robust subspace factorization , 2006, ICML.

[54]  W. Deming,et al.  The Minimum in the Gamma Function , 1935, Nature.

[55]  Philippe C. Besse,et al.  A L 1-norm PCA and a Heuristic Approach , 1996 .

[56]  D. G. Simpson,et al.  Robust principal component analysis for functional data , 2007 .