ℓ1 Major Component Detection and Analysis (ℓ1 MCDA): Foundations in Two Dimensions

Principal Component Analysis (PCA) is widely used for identifying the major components of statistically distributed point clouds. Robust versions of PCA, often based in part on the l1 norm (rather than the l2 norm), are increasingly used, especially for point clouds with many outliers. Neither standard PCA nor robust PCAs can provide, without additional assumptions, reliable information for outlier-rich point clouds and for distributions with several main directions (spokes). We carry out a fundamental and complete reformulation of the PCA approach in a framework based exclusively on the l1 norm and heavy-tailed distributions. The l1 Major Component Detection and Analysis (l1 MCDA) that we propose can determine the main directions and the radial extent of 2D data from single or multiple superimposed Gaussian or heavy-tailed distributions without and with patterned artificial outliers (clutter). In nearly all cases in the computational results, 2D l1 MCDA has accuracy superior to that of standard PCA and of two robust PCAs, namely, the projection-pursuit method of Croux and Ruiz-Gazen and the l1 factorization method of Ke and Kanade. (Standard PCA is, of course, superior to l1 MCDA for Gaussian-distributed point clouds.) The computing time of l1 MCDA is competitive with the computing times of the two robust PCAs.

[1]  Christophe Croux,et al.  High breakdown estimators for principal components: the projection-pursuit approach revisited , 2005 .

[2]  Pierre-Antoine Absil,et al.  Principal Manifolds for Data Visualization and Dimension Reduction , 2007 .

[3]  John E. Lavery,et al.  Univariate cubic L1 splines – A geometric programming approach , 2002, Math. Methods Oper. Res..

[4]  Olivier Gibaru,et al.  Fast L1kCk polynomial spline interpolation algorithm with shape-preserving properties , 2011, Comput. Aided Geom. Des..

[5]  Peter Filzmoser,et al.  A comparison of algorithms for the multivariate L1-median , 2010, Comput. Stat..

[6]  I. Jolliffe Principal Component Analysis , 2005 .

[7]  John E. Lavery,et al.  Univariate Cubic L1 Interpolating Splines: Analytical Results for Linearity, Convexity and Oscillation on 5-PointWindows , 2010, Algorithms.

[8]  C. Small A Survey of Multidimensional Medians , 1990 .

[9]  S. Resnick Heavy-Tail Phenomena: Probabilistic and Statistical Modeling , 2006 .

[10]  Sven Serneels,et al.  Principal component analysis for data containing outliers and missing elements , 2008, Comput. Stat. Data Anal..

[11]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[12]  John E. Lavery,et al.  Univariate Cubic L1 Interpolating Splines: Spline Functional, Window Size and Analysis-based Algorithm , 2010, Algorithms.

[13]  John E. Lavery,et al.  Univariate Cubic L 1 Interpolating Splines : Analytical Results for Linearity , Convexity and Oscillation on 5-Point Windows , 2010 .

[14]  Sidney Resnick,et al.  On the foundations of multivariate heavy-tail analysis , 2004, Journal of Applied Probability.

[15]  John E. Lavery,et al.  Comparison of Reconstruction and Texturing of 3D Urban Terrain by L1 Splines, Conventional Splines and Alpha Shapes , 2018, VISAPP.

[16]  T. Kanade,et al.  Robust subspace computation using L1 norm , 2003 .

[17]  John E. Lavery Univariate cubic Lp splines and shape-preserving, multiscale interpolation by univariate cubic L1 splines , 2000, Comput. Aided Geom. Des..

[18]  Anna K. Panorska,et al.  Data analysis for heavy tailed multivariate samples , 1997 .

[19]  Guoying Li,et al.  Projection-Pursuit Approach to Robust Dispersion Matrices and Principal Components: Primary Theory and Monte Carlo , 1985 .

[20]  Tao Chen,et al.  Robust probabilistic PCA with missing data and contribution analysis for outlier detection , 2009, Comput. Stat. Data Anal..

[21]  Allen Y. Yang,et al.  Estimation of Subspace Arrangements with Applications in Modeling and Segmenting Mixed Data , 2008, SIAM Rev..

[22]  Mia Hubert,et al.  Computational Statistics and Data Analysis Robust Pca for Skewed Data and Its Outlier Map , 2022 .

[23]  J. Nolan,et al.  Multivariate stable distributions: approximation, estimation, simulation and identification , 1998 .

[24]  Takeo Kanade,et al.  Robust L/sub 1/ norm factorization in the presence of outliers and missing data by alternative convex programming , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[25]  Nojun Kwak,et al.  Principal Component Analysis Based on L1-Norm Maximization , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.