S-Estimators for Functional Principal Component Analysis

Principal component analysis is a widely used technique that provides an optimal lower-dimensional approximation to multivariate or functional datasets. These approximations can be very useful in identifying potential outliers among high-dimensional or functional observations. In this article, we propose a new class of estimators for principal components based on robust scale estimators. For a fixed dimension q, we robustly estimate the q-dimensional linear space that provides the best prediction for the data, in the sense of minimizing the sum of robust scale estimators of the coordinates of the residuals. We also study an extension to the infinite-dimensional case. Our method is consistent for elliptical random vectors, and is Fisher consistent for elliptically distributed random elements on arbitrary Hilbert spaces. Numerical experiments show that our proposal is highly competitive when compared with other methods. We illustrate our approach on a real dataset, where the robust estimator discovers atypical observations that would have been missed otherwise. Supplementary materials for this article are available online.

[1]  J. Osborn Spectral approximation for compact operators , 1975 .

[2]  N. Campbell Robust Procedures in Multivariate Analysis I: Robust Covariance Estimation , 1980 .

[3]  S. J. Devlin,et al.  Robust Estimation of Dispersion Matrices and Principal Components , 1981 .

[4]  Guoying Li,et al.  Projection-Pursuit Approach to Robust Dispersion Matrices and Principal Components: Primary Theory and Monte Carlo , 1985 .

[5]  G. Boente Asymptotic theory for robust principal components , 1987 .

[6]  Gérard Antille,et al.  Stability of robust and non-robust principal components analysis , 1990 .

[7]  P. Rousseeuw,et al.  Unmasking Multivariate Outliers and Leverage Points , 1990 .

[8]  Leonard M. Adleman,et al.  Proof of proposition 3 , 1992 .

[9]  Ming-Deh A. Huang,et al.  Proof of proposition 2 , 1992 .

[10]  S. Sillman Tropospheric Ozone: The Debate over Control Strategies , 1993 .

[11]  W. Heiser,et al.  Resistant lower rank approximation of matrices by iterative majorization , 1994 .

[12]  S. Lohr Statistics (2nd Ed.) , 1994 .

[13]  Christophe Croux,et al.  A Fast Algorithm for Robust Principal Components Based on Projection Pursuit , 1996 .

[14]  Ursula Gather,et al.  The Masking Breakdown Point of Multivariate Outlier Identification Rules , 1999 .

[15]  Katrien van Driessen,et al.  A Fast Algorithm for the Minimum Covariance Determinant Estimator , 1999, Technometrics.

[16]  D. G. Simpson,et al.  Robust principal component analysis for functional data , 2007 .

[17]  C. Croux,et al.  Principal Component Analysis Based on Robust Estimators of the Covariance or Correlation Matrix: Influence Functions and Efficiencies , 2000 .

[18]  Ursula Gather,et al.  The largest nonindentifiable outlier: a comparison of multivariate simultaneous outlier identification rules , 2001 .

[19]  Michael J. Black,et al.  Robust Principal Component Analysis for Computer Vision , 2001, ICCV.

[20]  M. Hubert,et al.  A fast method for robust principal components with applications to chemometrics , 2002 .

[21]  Peter Filzmoser,et al.  Regressions , 2019, Energy Transfers by Convection.

[22]  Hengjian Cui,et al.  Asymptotic distributions of principal components based on robust dispersions , 2003 .

[23]  Li Liu,et al.  Robust singular value decomposition analysis of microarray data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Greet Pison,et al.  Diagnostic Plots for Robust Multivariate Methods , 2004 .

[25]  David M. Rocke,et al.  The Distribution of Robust Distances , 2005 .

[26]  Mia Hubert,et al.  ROBPCA: A New Approach to Robust Principal Component Analysis , 2005, Technometrics.

[27]  Ricardo A. Maronna,et al.  Principal Components and Orthogonal Regression Based on Robust Scales , 2005, Technometrics.

[28]  Christophe Croux,et al.  High breakdown estimators for principal components: the projection-pursuit approach revisited , 2005 .

[29]  V. Yohai,et al.  Robust Statistics: Theory and Methods , 2006 .

[30]  V. Yohai,et al.  A Fast Algorithm for S-Regression Estimates , 2006 .

[31]  Wenceslao González-Manteiga,et al.  A functional analysis of NOx levels: location and scale estimation and outlier detection , 2007, Comput. Stat..

[32]  Rob J. Hyndman,et al.  Robust forecasting of mortality and fertility rates: A functional data approach , 2007, Comput. Stat. Data Anal..

[33]  Ricardo A. Maronna,et al.  Robust lower-rank approximation of data matrices with element-wise contamination , 2007 .

[34]  Bruce Ainslie,et al.  Spatiotemporal Trends in Episodic Ozone Pollution in the Lower Fraser Valley, British Columbia, in Relation to Mesoscale Atmospheric Circulation Patterns and Emissions , 2007 .

[35]  Mia Hubert,et al.  An adjusted boxplot for skewed distributions , 2008, Comput. Stat. Data Anal..

[36]  M. Febrero,et al.  Outlier detection in functional data by depth measures, with application to identify abnormal NOx levels , 2008 .

[37]  Victor J. Yohai,et al.  Robust Low-Rank Approximation of Data Matrices With Elementwise Contamination , 2008, Technometrics.

[38]  D. Gervini Robust functional estimation using the median and spherical principal components , 2008 .

[39]  G. Boente,et al.  Principal points and elliptical distributions from the multivariate setting to the functional case , 2009, 2006.04188.

[40]  Rob J Hyndman,et al.  Rainbow Plots, Bagplots, and Boxplots for Functional Data , 2010 .

[41]  J. Tropp,et al.  Two proposals for robust PCA using semidefinite programming , 2010, 1012.1086.

[42]  Kuldeep Kumar,et al.  Robust Statistics, 2nd edn , 2011 .

[43]  David E. Tyler,et al.  Robust functional principal components: A projection-pursuit approach , 2011, 1203.2027.

[44]  Kuldeep Kumar Robust Statistics, 2nd edition by P.J. Huber & E.M. Ronchetti [book review] , 2011 .

[45]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[46]  Pablo A. Parrilo,et al.  Rank-Sparsity Incoherence for Matrix Decomposition , 2009, SIAM J. Optim..

[47]  Constantine Caramanis,et al.  Robust PCA via Outlier Pursuit , 2010, IEEE Transactions on Information Theory.

[48]  Hyejin Shin,et al.  Functional outlier detection with robust functional principal component analysis , 2011, Computational Statistics.

[49]  Joel A. Tropp,et al.  Robust computation of linear models, or How to find a needle in a haystack , 2012, ArXiv.

[50]  David E. Tyler,et al.  A characterization of elliptical distributions and some optimality properties of principal components for functional data , 2014, J. Multivar. Anal..

[51]  Gilad Lerman,et al.  A novel M-estimator for robust PCA , 2011, J. Mach. Learn. Res..