Distributed Estimation for Principal Component Analysis: An Enlarged Eigenspace Analysis

The growing size of modern datasets brings many challenges to the existing statistical estimation approaches, which calls for new distributed methodologies. This article studies distributed estimat...

[1]  C. Stein Estimation of the Mean of a Multivariate Normal Distribution , 1981 .

[2]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[3]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Martin J. Wainwright,et al.  Divide and conquer kernel ridge regression: a distributed algorithm with minimax optimal rates , 2013, J. Mach. Learn. Res..

[5]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[6]  Tengyao Wang,et al.  A useful variant of the Davis--Kahan theorem for statisticians , 2014, 1405.0680.

[7]  Xi Chen,et al.  First-Order Newton-Type Estimator for Distributed Estimation and Inference , 2018, Journal of the American Statistical Association.

[8]  Anru R. Zhang,et al.  Rate-Optimal Perturbation Bounds for Singular Subspaces with Applications to High-Dimensional Statistics , 2016, 1605.00353.

[9]  Han Liu,et al.  A PARTIALLY LINEAR FRAMEWORK FOR MASSIVE HETEROGENEOUS DATA. , 2014, Annals of statistics.

[10]  Vincent Q. Vu,et al.  MINIMAX SPARSE PRINCIPAL SUBSPACE ESTIMATION IN HIGH DIMENSIONS , 2012, 1211.0373.

[11]  Dong Wang,et al.  Distributed estimation of principal eigenspaces. , 2017, Annals of statistics.

[12]  Chengchun Shi,et al.  A Massive Data Framework for M-Estimators with Cubic-Rate , 2016, Journal of the American Statistical Association.

[13]  Tianbao Yang,et al.  Stochastic Variance Reduced Gradient Methods by Sampling Extra Data with Replacement , 2017 .

[14]  A. Juditsky,et al.  Direct estimation of the index coefficient in a single-index model , 2001 .

[15]  Ker-Chau Li,et al.  On Principal Hessian Directions for Data Visualization and Dimension Reduction: Another Application of Stein's Lemma , 1992 .

[16]  Guang Cheng,et al.  Distributed inference for quantile regression processes , 2017, The Annals of Statistics.

[17]  Jianqing Fan,et al.  DISTRIBUTED TESTING AND ESTIMATION UNDER SPARSE HIGH DIMENSIONAL MODELS. , 2018, Annals of statistics.

[18]  T. Cai,et al.  Sparse PCA: Optimal rates and adaptive estimation , 2012, 1211.1309.

[19]  Yun Yang,et al.  Communication-Efficient Distributed Statistical Inference , 2016, Journal of the American Statistical Association.

[20]  Jianqing Fan,et al.  Asymptotics of empirical eigenstructure for high dimensional spiked covariance. , 2017, Annals of statistics.

[21]  W. Kahan,et al.  The Rotation of Eigenvectors by a Perturbation. III , 1970 .

[22]  I. Johnstone,et al.  On Consistency and Sparsity for Principal Components Analysis in High Dimensions , 2009, Journal of the American Statistical Association.

[23]  I. Johnstone On the distribution of the largest eigenvalue in principal components analysis , 2001 .

[24]  Wotao Yin,et al.  A feasible method for optimization with orthogonality constraints , 2013, Math. Program..

[25]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[26]  Moulinath Banerjee,et al.  Divide and conquer in nonstandard problems and the super-efficiency phenomenon , 2016, The Annals of Statistics.

[27]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[28]  HaiYing Wang,et al.  More Efficient Estimation for Logistic Regression with Optimal Subsamples , 2018, J. Mach. Learn. Res..

[29]  Xi Chen,et al.  Quantile regression under memory constraint , 2018, The Annals of Statistics.

[30]  R. Tibshirani,et al.  Prediction by Supervised Principal Components , 2006 .

[31]  Martin J. Wainwright,et al.  Fast mixing of Metropolized Hamiltonian Monte Carlo: Benefits of multi-step gradients , 2019, J. Mach. Learn. Res..