High‐dimensional principal component analysis with heterogeneous missingness

We study the problem of high-dimensional Principal Component Analysis (PCA) with missing observations. In simple, homogeneous missingness settings with a noise level of constant order, we show that an existing inverse-probability weighted (IPW) estimator of the leading principal components can (nearly) attain the minimax optimal rate of convergence. However, deeper investigation reveals both that, particularly in more realistic settings where the missingness mechanism is heterogeneous, the empirical performance of the IPW estimator can be unsatisfactory, and moreover that, in the noiseless case, it fails to provide exact recovery of the principal components. Our main contribution, then, is to introduce a new method for high-dimensional PCA, called `primePCA', that is designed to cope with situations where observations may be missing in a heterogeneous manner. Starting from the IPW estimator, primePCA iteratively projects the observed entries of the data matrix onto the column space of our current estimate to impute the missing entries, and then updates our estimate by computing the leading right singular space of the imputed data matrix. It turns out that the interaction between the heterogeneity of missingness and the low-dimensional structure is crucial in determining the feasibility of the problem. We therefore introduce an incoherence condition on the principal components and prove that in the noiseless case, the error of primePCA converges to zero at a geometric rate when the signal strength is not too small. An important feature of our theoretical guarantees is that they depend on average, as opposed to worst-case, properties of the missingness mechanism. Our numerical studies on both simulated and real data reveal that primePCA exhibits very encouraging performance across a wide range of scenarios.

[1]  G. A. Young,et al.  High‐dimensional Statistics: A Non‐asymptotic Viewpoint, Martin J.Wainwright, Cambridge University Press, 2019, xvii 552 pages, £57.99, hardback ISBN: 978‐1‐1084‐9802‐9 , 2020, International Statistical Review.

[2]  Andreas Elsener,et al.  Sparse spectral estimation with missing and corrupted measurements , 2018, Stat.

[3]  Yuxin Chen,et al.  Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview , 2018, IEEE Transactions on Signal Processing.

[4]  Linjun Zhang,et al.  High dimensional linear discriminant analysis: optimality, adaptive algorithm and missing data , 2018, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[5]  C. Priebe,et al.  The two-to-infinity norm and singular subspace geometry with applications to high-dimensional statistics , 2017, The Annals of Statistics.

[6]  Soumendu Sundar Mukherjee,et al.  Weak convergence and empirical processes , 2019 .

[7]  Po-Ling Loh High-dimensional robust precision matrix estimation: Cellwise corruption under -contamination , 2018 .

[8]  Chao Gao,et al.  Achieving Optimal Misclassification Proportion in Stochastic Block Models , 2015, J. Mach. Learn. Res..

[9]  Anru R. Zhang,et al.  Rate-Optimal Perturbation Bounds for Singular Subspaces with Applications to High-Dimensional Statistics , 2016, 1605.00353.

[10]  Hongtu Zhu,et al.  The Statistics and Mathematics of High Dimension Low Sample Size Asymptotics. , 2016, Statistica Sinica.

[11]  Lena Osterhagen,et al.  Multiple Imputation For Nonresponse In Surveys , 2016 .

[12]  Donggyu Kim,et al.  Asymptotic Theory for Estimating the Singular Vectors and Values of a Partially-observed Low Rank Matrix with Noise , 2015, 1508.05431.

[13]  Jianqing Fan,et al.  Asymptotics of Empirical Eigen-structure for Ultra-high Dimensional Spiked Covariance Model , 2015, 1502.04733.

[14]  Trevor J. Hastie,et al.  Matrix completion and low-rank SVD via fast alternating least squares , 2014, J. Mach. Learn. Res..

[15]  Julie Josse,et al.  Principal component analysis with missing values: a comparative survey of methods , 2015, Plant Ecology.

[16]  A. Tsybakov,et al.  Linear and conic programming estimators in high dimensional errors‐in‐variables models , 2014, 1408.0241.

[17]  Tengyao Wang,et al.  A useful variant of the Davis--Kahan theorem for statisticians , 2014, 1405.0680.

[18]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[19]  Karim Lounici Sparse Principal Component Analysis with Missing Observations , 2012, 1205.7060.

[20]  Jianqing Fan,et al.  Large covariance estimation by thresholding principal orthogonal complements , 2011, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[21]  Julie Josse,et al.  Handling missing values in exploratory multivariate data analysis methods , 2012 .

[22]  Karim Lounici High-dimensional covariance matrix estimation with missing observations , 2012, 1201.2577.

[23]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[24]  Martin J. Wainwright,et al.  Restricted strong convexity and weighted matrix completion: Optimal bounds with noise , 2010, J. Mach. Learn. Res..

[25]  Joel A. Tropp,et al.  User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..

[26]  Po-Ling Loh,et al.  High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity , 2011, NIPS.

[27]  Bin Yu,et al.  Spectral clustering and the high-dimensional stochastic blockmodel , 2010, 1007.1684.

[28]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[29]  V. Koltchinskii,et al.  Nuclear norm penalization and optimal rates for noisy low rank matrix completion , 2010, 1011.6256.

[30]  Robert Tibshirani,et al.  Spectral Regularization Algorithms for Learning Large Incomplete Matrices , 2010, J. Mach. Learn. Res..

[31]  Emmanuel J. Candès,et al.  Matrix Completion With Noise , 2009, Proceedings of the IEEE.

[32]  Andrea Montanari,et al.  Matrix completion from a few entries , 2009, 2009 IEEE International Symposium on Information Theory.

[33]  I. Johnstone,et al.  On Consistency and Sparsity for Principal Components Analysis in High Dimensions , 2009, Journal of the American Statistical Association.

[34]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[35]  J. Pagès,et al.  Gestion des données manquantes en analyse en composantes principales , 2009 .

[36]  P. Massart,et al.  Concentration inequalities and model selection , 2007 .

[37]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[38]  H. Kiers Weighted least squares fitting using ordinary least squares algorithms , 1997 .

[39]  Bin Yu Assouad, Fano, and Le Cam , 1997 .

[40]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[41]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[42]  T. W. Anderson Maximum Likelihood Estimates for a Multivariate Normal Distribution when Some Observations are Missing , 1957 .