Coherence Pursuit: Fast, Simple, and Robust Principal Component Analysis

This paper presents a remarkably simple, yet powerful, algorithm termed coherence pursuit (CoP) to robust principal component analysis (PCA). As inliers lie in a low-dimensional subspace and are mostly correlated, an inlier is likely to have strong mutual coherence with a large number of data points. By contrast, outliers either do not admit low-dimensional structures or form small clusters. In either case, an outlier is unlikely to bear strong resemblance to a large number of data points. Given that, CoP sets an outlier apart from an inlier by comparing their coherence with the rest of the data points. The mutual coherences are computed by forming the Gram matrix of the normalized data points. Subsequently, the sought subspace is recovered from the span of the subset of the data points that exhibit strong coherence with the rest of the data. As CoP only involves one simple matrix multiplication, it is significantly faster than the state-of-the-art robust PCA algorithms. We derive analytical performance guarantees for CoP under different models for the distributions of inliers and outliers in both noise-free and noisy settings. CoP is the first robust PCA algorithm that is simultaneously non-iterative, provably robust to both unstructured and structured outliers, and can tolerate a large number of unstructured outliers.

[1]  G. Lerman,et al.  Robust recovery of multiple subspaces by geometric l_p minimization , 2011, 1104.3770.

[2]  Qinghua Zheng,et al.  Avoiding Optimal Mean Robust PCA/2DPCA with Non-greedy ℓ1-Norm Maximization , 2016, IJCAI.

[3]  Guillermo Sapiro,et al.  Statistical Compressed Sensing of Gaussian Mixture Models , 2011, IEEE Transactions on Signal Processing.

[4]  Pablo A. Parrilo,et al.  Rank-Sparsity Incoherence for Matrix Decomposition , 2009, SIAM J. Optim..

[5]  Ricardo A. Maronna,et al.  Principal Components and Orthogonal Regression Based on Robust Scales , 2005, Technometrics.

[6]  Kentaro Toyama,et al.  Wallflower: principles and practice of background maintenance , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[7]  Joel A. Tropp,et al.  Robust Computation of Linear Models by Convex Relaxation , 2012, Foundations of Computational Mathematics.

[8]  Xuming He,et al.  Robust low-rank data matrix approximations , 2017 .

[9]  Feiping Nie,et al.  Optimal Mean Robust Principal Component Analysis , 2014, ICML.

[10]  Christopher Krügel,et al.  Anomaly detection of web-based attacks , 2003, CCS '03.

[11]  George Atia,et al.  A Subspace Learning Approach for High Dimensional Matrix Decomposition with Efficient Column/Row Sampling , 2016, ICML.

[12]  George Atia,et al.  Robust and Scalable Column/Row Sampling from Corrupted Big Data , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[13]  René Vidal,et al.  Sparse Subspace Clustering: Algorithm, Theory, and Applications , 2012, IEEE transactions on pattern analysis and machine intelligence.

[14]  M. Ledoux The concentration of measure phenomenon , 2001 .

[15]  Shijian Lu,et al.  Multimodal Dictionary Learning and Joint Sparse Representation for HEp-2 Cell Classification , 2015, MICCAI.

[16]  Constantine Caramanis,et al.  Fast Algorithms for Robust PCA via Gradient Descent , 2016, NIPS.

[17]  George Atia,et al.  Innovation Pursuit: A New Approach to Subspace Clustering , 2015, IEEE Transactions on Signal Processing.

[18]  Constantine Caramanis,et al.  Robust Matrix Completion and Corrupted Columns , 2011, ICML.

[19]  George Atia,et al.  High Dimensional Low Rank Plus Sparse Matrix Decomposition , 2015, IEEE Transactions on Signal Processing.

[20]  Benjamin Recht,et al.  A Simpler Approach to Matrix Completion , 2009, J. Mach. Learn. Res..

[21]  J. Tropp,et al.  Two proposals for robust PCA using semidefinite programming , 2010, 1012.1086.

[22]  Helmut Bölcskei,et al.  Robust Subspace Clustering via Thresholding , 2013, IEEE Transactions on Information Theory.

[23]  Ankur Moitra,et al.  Algorithms and Hardness for Robust Subspace Recovery , 2012, COLT.

[24]  Gilad Lerman,et al.  Fast, Robust and Non-convex Subspace Recovery , 2014, 1406.6145.

[25]  Joel A. Tropp,et al.  Robust computation of linear models, or How to find a needle in a haystack , 2012, ArXiv.

[26]  Constantine Caramanis,et al.  Robust PCA via Outlier Pursuit , 2010, IEEE Transactions on Information Theory.

[27]  Shie Mannor,et al.  Principal Component Analysis with Contaminated Data: The High Dimensional Case , 2010, COLT 2010.

[28]  George Atia,et al.  Randomized Robust Subspace Recovery and Outlier Detection for High Dimensional Data Matrices , 2015, IEEE Transactions on Signal Processing.

[29]  Teng Zhang Robust subspace recovery by geodesically convex optimization , 2012, 1206.1386.

[30]  George Atia,et al.  Analysis of randomized robust PCA for high dimensional data , 2015, 2015 IEEE Signal Processing and Signal Processing Education Workshop (SP/SPE).

[31]  Feiping Nie,et al.  Robust Principal Component Analysis with Non-Greedy l1-Norm Maximization , 2011, IJCAI.

[32]  George Atia,et al.  Coherence Pursuit: Fast, Simple, and Robust Subspace Recovery , 2017, ICML.

[33]  Ali Taalimi,et al.  Feature encoding in band-limited distributed surveillance systems , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[34]  Shie Mannor,et al.  Outlier-Robust PCA: The High-Dimensional Case , 2013, IEEE Transactions on Information Theory.

[35]  T. Tony Cai,et al.  Phase transition in limiting distributions of coherence of high-dimensional random matrices , 2011, J. Multivar. Anal..

[36]  Søren Hauberg,et al.  Scalable Robust Principal Component Analysis Using Grassmann Averages , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[38]  Emmanuel J. Candès,et al.  A Geometric Analysis of Subspace Clustering with Outliers , 2011, ArXiv.

[39]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[40]  Gilad Lerman,et al.  A novel M-estimator for robust PCA , 2011, J. Mach. Learn. Res..

[41]  Feiping Nie,et al.  Joint Schatten $$p$$p-norm and $$\ell _p$$ℓp-norm robust matrix completion for missing value recovery , 2013, Knowledge and Information Systems.

[42]  Xiaoning Qian,et al.  Bayesian module identification from multiple noisy networks , 2016, EURASIP J. Bioinform. Syst. Biol..

[43]  Soon Ki Jung,et al.  Decomposition into Low-rank plus Additive Matrices for Background/Foreground Separation: A Review for a Comparative Evaluation with a Large-Scale Dataset , 2015, Comput. Sci. Rev..

[44]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[45]  Prateek Jain,et al.  Non-convex Robust PCA , 2014, NIPS.

[46]  Takeo Kanade,et al.  Robust L/sub 1/ norm factorization in the presence of outliers and missing data by alternative convex programming , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[47]  Nanning Zheng,et al.  Learning to Detect a Salient Object , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Constantine Caramanis,et al.  Greedy Subspace Clustering , 2014, NIPS.

[49]  徹 川田 第36 回Annual International Conference of the IEEE Engineering in Medicine and Biology Society , 2015 .

[50]  René Vidal,et al.  A Benchmark for the Comparison of 3-D Motion Segmentation Algorithms , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  George Atia,et al.  Randomized Robust Subspace Recovery for High Dimensional Data Matrices , 2015, ArXiv.

[52]  Hans-Peter Kriegel,et al.  Subspace clustering , 2012, WIREs Data Mining Knowl. Discov..

[53]  Constantine Caramanis,et al.  Robust Matrix Completion with Corrupted Columns , 2011, ArXiv.

[54]  Ronen Basri,et al.  Lambertian reflectance and linear subspaces , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[55]  Chris H. Q. Ding,et al.  R1-PCA: rotational invariant L1-norm principal component analysis for robust subspace factorization , 2006, ICML.

[56]  Jarvis D. Haupt,et al.  Identifying Outliers in Large Matrices via Randomized Adaptive Compressive Sampling , 2014, IEEE Transactions on Signal Processing.

[57]  Gilad Lerman,et al.  $${l_p}$$lp-Recovery of the Most Significant Subspace Among Multiple Subspaces with Outliers , 2010, ArXiv.

[58]  Christian Jutten,et al.  Outlier-aware dictionary learning for sparse representation , 2014, 2014 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).

[59]  V. Milman,et al.  Asymptotic Theory Of Finite Dimensional Normed Spaces , 1986 .

[60]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[61]  René Vidal,et al.  Dual Principal Component Pursuit , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[62]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[63]  Takeo Kanade,et al.  A Multibody Factorization Method for Independently Moving Objects , 1998, International Journal of Computer Vision.

[64]  Qi Tian,et al.  Statistical modeling of complex backgrounds for foreground object detection , 2004, IEEE Transactions on Image Processing.

[65]  Kerstin Vogler,et al.  Table Of Integrals Series And Products , 2016 .

[66]  Mehrdad Nourani,et al.  An unsupervised subject identification technique using EEG signals , 2016, 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[67]  Greg Tucker-Kellogg,et al.  A Comparison of Methods for Data-Driven Cancer Outlier Discovery, and An Application Scheme to Semisupervised Predictive Biomarker Discovery , 2011, Cancer informatics.

[68]  S Ullman,et al.  Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.

[69]  Hamid Soltanian-Zadeh,et al.  Support Vector Machine with nonlinear-kernel optimization for lateralization of epileptogenic hippocampus in MR images , 2014, 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.