LARGE COVARIANCE ESTIMATION THROUGH ELLIPTICAL FACTOR MODELS.

We propose a general Principal Orthogonal complEment Thresholding (POET) framework for large-scale covariance matrix estimation based on the approximate factor model. A set of high level sufficient conditions for the procedure to achieve optimal rates of convergence under different matrix norms is established to better understand how POET works. Such a framework allows us to recover existing results for sub-Gaussian data in a more transparent way that only depends on the concentration properties of the sample covariance matrix. As a new theoretical contribution, for the first time, such a framework allows us to exploit conditional sparsity covariance structure for the heavy-tailed data. In particular, for the elliptical distribution, we propose a robust estimator based on the marginal and spatial Kendall's tau to satisfy these conditions. In addition, we study conditional graphical model under the same framework. The technical tools developed in this paper are of general interest to high dimensional principal component analysis. Thorough numerical results are also provided to back up the developed theory.

[1]  Jianqing Fan,et al.  Robust Covariance Estimation for Approximate Factor Models. , 2016, Journal of econometrics.

[2]  Furno Marilena,et al.  Quantile Regression , 2018, Wiley Series in Probability and Statistics.

[3]  Jianqing Fan,et al.  An l∞ Eigenvector Perturbation Bound and Its Application to Robust Covariance Estimation , 2018, Journal of machine learning research : JMLR.

[4]  Han Liu,et al.  Heterogeneity adjustment with applications to graphical model inference. , 2016, Electronic journal of statistics.

[5]  Jianqing Fan,et al.  Asymptotics of empirical eigenstructure for high dimensional spiked covariance. , 2017, Annals of statistics.

[6]  Jianqing Fan,et al.  Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions , 2017, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[7]  Han Liu,et al.  Statistical analysis of latent generalized correlation matrix estimation in transelliptical distribution. , 2013, Bernoulli : official journal of the Bernoulli Society for Mathematical Statistics and Probability.

[8]  Jianqing Fan,et al.  PROJECTED PRINCIPAL COMPONENT ANALYSIS IN FACTOR MODELS. , 2014, Annals of statistics.

[9]  M. Wegkamp,et al.  Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas , 2013, 1305.6526.

[10]  Jianqing Fan,et al.  Asymptotics of Empirical Eigen-structure for Ultra-high Dimensional Spiked Covariance Model , 2015, 1502.04733.

[11]  Jianqing Fan,et al.  QUADRO: A SUPERVISED DIMENSION REDUCTION METHOD VIA RAYLEIGH QUOTIENT OPTIMIZATION. , 2013, Annals of statistics.

[12]  T. Cai,et al.  Optimal estimation and rank detection for sparse spiked covariance matrices , 2013, Probability theory and related fields.

[13]  Jianqing Fan,et al.  Robust Estimation of High-Dimensional Mean Regression , 2014, 1410.2150.

[14]  Tuo Zhao,et al.  Positive Semidefinite Rank-Based Correlation Matrix Estimation With Application to Semiparametric Graph Estimation , 2014, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[15]  Daniel J. Hsu,et al.  Heavy-tailed regression with a generalized median-of-means , 2014, ICML.

[16]  Cun-Hui Zhang,et al.  Multivariate Analysis of Nonparametric Estimates of Large Correlation Matrices , 2014, 1403.6195.

[17]  Han Liu,et al.  Scale-Invariant Sparse PCA on High-Dimensional Meta-Elliptical Data , 2014, Journal of the American Statistical Association.

[18]  Daniel Vogel,et al.  The spatial sign covariance matrix with unknown location , 2013, J. Multivar. Anal..

[19]  Jianqing Fan,et al.  ADAPTIVE ROBUST VARIABLE SELECTION. , 2012, Annals of statistics.

[20]  Robert J. Vanderbei,et al.  The fastclime package for linear programming and large-scale precision matrix estimation in R , 2014, J. Mach. Learn. Res..

[21]  L Nyström,et al.  Statistical Analysis , 2008, Encyclopedia of Social Network Analysis and Mining.

[22]  Han Liu,et al.  Optimal Sparse Principal Component Analysis in High Dimensional Elliptical Model , 2013 .

[23]  Han Liu,et al.  ECA: High-Dimensional Elliptical Component Analysis in Non-Gaussian Distributions , 2013, 1310.3561.

[24]  J. Bai,et al.  Principal components estimation and identification of static factors , 2013 .

[25]  Philippe Rigollet,et al.  Complexity Theoretic Lower Bounds for Sparse Principal Component Detection , 2013, COLT.

[26]  M. Rudelson,et al.  Hanson-Wright inequality and sub-gaussian concentration , 2013 .

[27]  Harrison H. Zhou,et al.  Optimal rates of convergence for estimating Toeplitz covariance matrices , 2013 .

[28]  Han Liu,et al.  Optimal Rates of Convergence for Latent Generalized Correlation Matrix Estimation in Transelliptical Distribution , 2013 .

[29]  T. Cai,et al.  Sparse PCA: Optimal rates and adaptive estimation , 2012, 1211.1309.

[30]  Vincent Q. Vu,et al.  MINIMAX SPARSE PRINCIPAL SUBSPACE ESTIMATION IN HIGH DIMENSIONS , 2012, 1211.0373.

[31]  B. Nadler,et al.  MINIMAX BOUNDS FOR SPARSE PCA WITH NOISY HIGH-DIMENSIONAL DATA. , 2012, Annals of statistics.

[32]  Jianqing Fan,et al.  Large covariance estimation by thresholding principal orthogonal complements , 2011, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[33]  Zongming Ma Sparse Principal Component Analysis and Iterative Thresholding , 2011, 1112.2432.

[34]  Dan Shen,et al.  Consistency of sparse PCA in High Dimension, Low Sample Size contexts , 2011, J. Multivar. Anal..

[35]  Haotian Pang,et al.  The fastclime Package for Linear Programming and Constrained l 1-Minimization Approach to Sparse Precision Matrix Estimation in R , 2013 .

[36]  Fang Han,et al.  Transelliptical Graphical Models , 2012, NIPS.

[37]  Harrison H. Zhou,et al.  OPTIMAL RATES OF CONVERGENCE FOR SPARSE COVARIANCE MATRIX ESTIMATION , 2012, 1302.3030.

[38]  P. Rigollet,et al.  Optimal detection of sparse principal components in high dimension , 2012, 1202.5070.

[39]  I. Johnstone,et al.  Augmented sparse principal component analysis for high dimensional data , 2012, 1202.1242.

[40]  Jing Lei,et al.  Minimax Rates of Estimation for Sparse PCA in High Dimensions , 2012, AISTATS.

[41]  Kunpeng Li,et al.  STATISTICAL ANALYSIS OF FACTOR MODELS OF HIGH DIMENSION , 2012, 1205.6617.

[42]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[43]  D. Vogel,et al.  Elliptical graphical modelling , 2011, 1506.04321.

[44]  Sham M. Kakade,et al.  Robust Matrix Decomposition With Sparse Corruptions , 2011, IEEE Transactions on Information Theory.

[45]  Sham M. Kakade,et al.  A tail inequality for quadratic forms of subgaussian random vectors , 2011, ArXiv.

[46]  Jianqing Fan,et al.  High Dimensional Covariance Matrix Estimation in Approximate Factor Models , 2011, Annals of statistics.

[47]  Martin J. Wainwright,et al.  Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions , 2011, ICML.

[48]  Weidong Liu,et al.  Adaptive Thresholding for Sparse Covariance Matrix Estimation , 2011, 1102.2237.

[49]  T. Cai,et al.  A Constrained ℓ1 Minimization Approach to Sparse Precision Matrix Estimation , 2011, 1102.2233.

[50]  Peter Sykacek,et al.  Biological assessment of robust noise models in microarray data analysis , 2011, Bioinform..

[51]  Pablo A. Parrilo,et al.  Rank-Sparsity Incoherence for Matrix Decomposition , 2009, SIAM J. Optim..

[52]  C. O. A. D. P. R. M. A. E. Stimation Covariate Adjusted Precision Matrix Estimation with an Application in Genetical Genomics , 2011 .

[53]  Maria-Pia Victoria-Feser,et al.  Robust Estimation of Constrained Covariance Matrices for Confirmatory Factor Analysis , 2010, Comput. Stat. Data Anal..

[54]  O. Catoni Challenging the empirical mean and empirical variance: a deviation study , 2010, 1009.2048.

[55]  R. Vershynin,et al.  Partial estimation of covariance matrices , 2010, 1008.1716.

[56]  Pablo A. Parrilo,et al.  Latent variable graphical model selection via convex optimization , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[57]  Harrison H. Zhou,et al.  Optimal rates of convergence for covariance matrix estimation , 2010, 1010.3866.

[58]  Ming Yuan,et al.  High Dimensional Inverse Covariance Matrix Estimation via Linear Programming , 2010, J. Mach. Learn. Res..

[59]  Adam J Rothman,et al.  Sparse Multivariate Regression With Covariance Estimation , 2010, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[60]  I. Johnstone,et al.  On Consistency and Sparsity for Principal Components Analysis in High Dimensions , 2009, Journal of the American Statistical Association.

[61]  Jianqing Fan,et al.  NETWORK EXPLORATION VIA THE ADAPTIVE LASSO AND SCAD PENALTIES. , 2009, The annals of applied statistics.

[62]  A. Belloni,et al.  L1-Penalized Quantile Regression in High Dimensional Sparse Models , 2009, 0904.2931.

[63]  Yufeng Liu,et al.  VARIABLE SELECTION IN QUANTILE REGRESSION , 2009 .

[64]  Adam J. Rothman,et al.  Generalized Thresholding of Large Covariance Matrices , 2009 .

[65]  P. Bickel,et al.  Covariance regularization by thresholding , 2009, 0901.3079.

[66]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[67]  Jianqing Fan,et al.  Sparsistency and Rates of Convergence in Large Covariance Matrix Estimation. , 2007, Annals of statistics.

[68]  Noureddine El Karoui,et al.  Operator norm consistent estimation of large-dimensional sparse covariance matrices , 2008, 0901.3220.

[69]  Bin Yu,et al.  High-dimensional covariance estimation by minimizing ℓ1-penalized log-determinant divergence , 2008, 0811.3628.

[70]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[71]  H. Zou,et al.  Composite quantile regression and the oracle Model Selection Theory , 2008, 0806.2905.

[72]  M. Wainwright,et al.  High-dimensional analysis of semidefinite relaxations for sparse principal components , 2008, 2008 IEEE International Symposium on Information Theory.

[73]  P. Bickel,et al.  Regularized estimation of large covariance matrices , 2008, 0803.1909.

[74]  M. Yuan,et al.  Model selection and estimation in the Gaussian graphical model , 2007 .

[75]  Jianqing Fan,et al.  High dimensional covariance matrix estimation using a factor model , 2007, math/0701124.

[76]  D. Paul ASYMPTOTICS OF SAMPLE EIGENSTRUCTURE FOR A LARGE DIMENSIONAL SPIKED COVARIANCE MODEL , 2007 .

[77]  D. Paindaveine,et al.  SEMIPARAMETRICALLY EFFICIENT RANK-BASED INFERENCE FOR SHAPE II. OPTIMAL R-ESTIMATION OF SHAPE , 2006, 0708.0079.

[78]  D. Paindaveine,et al.  SEMIPARAMETRICALLY EFFICIENT RANK-BASED INFERENCE FOR SHAPE I. OPTIMAL RANK-BASED TESTS FOR SPHERICITY , 2006, 0707.4621.

[79]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[80]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[81]  David Christensen,et al.  Fast algorithms for the calculation of Kendall’s τ , 2005, Comput. Stat..

[82]  D. Paindaveine Optimal Rank-based Tests for Sphericity , 2005 .

[83]  Li Liu,et al.  Robust singular value decomposition analysis of microarray data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[84]  W. Härdle,et al.  Statistical Tools for Finance and Insurance , 2003 .

[85]  S. Rachev Handbook of heavy tailed distributions in finance , 2003 .

[86]  P. Rousseeuw,et al.  Robust factor analysis , 2003 .

[87]  F. Lindskog,et al.  Multivariate extremes, aggregation and dependence in elliptical distributions , 2002, Advances in Applied Probability.

[88]  H. Oja,et al.  Sign and Rank Covariance Matrices: Statistical Properties and Application to Principal Components Analysis , 2002 .

[89]  Jianqing Fan,et al.  Regularization of Wavelet Approximations , 2001 .

[90]  H. Oja,et al.  Sign and rank covariance matrices , 2000 .

[91]  P. Massart,et al.  Adaptive estimation of a quadratic functional by model selection , 2000 .

[92]  J. Bai,et al.  Determining the Number of Factors in Approximate Factor Models , 2000 .

[93]  J. Marden Some robust estimates of principal components , 1999 .

[94]  Michael Unser,et al.  Statistical analysis of functional MRI data in the wavelet domain , 1998, IEEE Transactions on Medical Imaging.

[95]  K. Choi,et al.  A multivariate version of kendall's τ , 1998 .

[96]  Robert J. Vanderbei,et al.  Linear Programming: Foundations and Extensions , 1998, Kluwer international series in operations research and management service.

[97]  P. Rousseeuw,et al.  Alternatives to the Median Absolute Deviation , 1993 .

[98]  S. Kotz,et al.  Symmetric Multivariate and Related Distributions , 1989 .

[99]  R. Hogg,et al.  On adaptive estimation , 1984 .

[100]  M. Rothschild,et al.  Arbitrage, Factor Structure, and Mean-Variance Analysis on Large Asset Markets , 1982 .

[101]  David E. Tyler Radial estimates and the test for sphericity , 1982 .

[102]  R. Forthofer,et al.  Rank Correlation Methods , 1981 .

[103]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[104]  R. Beran An Efficient and Robust Adaptive Estimator of Location , 1978 .

[105]  F. Hampel The Influence Curve and Its Role in Robust Estimation , 1974 .

[106]  W. Kahan,et al.  The Rotation of Eigenvectors by a Perturbation. III , 1970 .

[107]  W. Knight A Computer Method for Calculating Kendall's Tau with Ungrouped Data , 1966 .

[108]  Chandler Davis The rotation of eigenvectors by a perturbation , 1963 .