Rate Optimal Estimation and Confidence Intervals for High-dimensional Regression with Missing Covariates

Although a majority of the theoretical literature in high-dimensional statistics has focused on settings which involve fully-observed data, settings with missing values and corruptions are common in practice. We consider the problems of estimation and of constructing component-wise confidence intervals in a sparse high-dimensional linear regression model when some covariates of the design matrix are missing completely at random. We analyze a variant of the Dantzig selector [9] for estimating the regression model and we use a de-biasing argument to construct component-wise confidence intervals. Our first main result is to establish upper bounds on the estimation error as a function of the model parameters (the sparsity level s, the expected fraction of observed covariates $\rho_*$, and a measure of the signal strength $\|\beta^*\|_2$). We find that even in an idealized setting where the covariates are assumed to be missing completely at random, somewhat surprisingly and in contrast to the fully-observed setting, there is a dichotomy in the dependence on model parameters and much faster rates are obtained if the covariance matrix of the random design is known. To study this issue further, our second main contribution is to provide lower bounds on the estimation error showing that this discrepancy in rates is unavoidable in a minimax sense. We then consider the problem of high-dimensional inference in the presence of missing data. We construct and analyze confidence intervals using a de-biased estimator. In the presence of missing data, inference is complicated by the fact that the de-biasing matrix is correlated with the pilot estimator and this necessitates the design of a new estimator and a novel analysis. We also complement our mathematical study with extensive simulations on synthetic and semi-synthetic data that show the accuracy of our asymptotic predictions for finite sample sizes.

[1]  Larry A. Wasserman,et al.  Statistical Analysis of Semi-Supervised Regression , 2007, NIPS.

[2]  R. Z. Khasʹminskiĭ,et al.  Statistical estimation : asymptotic theory , 1981 .

[3]  K. Miller On the Inverse of the Sum of Matrices , 1981 .

[4]  J. T. Hwang Multiplicative Errors-in-Variables Models with Applications to Recent Data Released by the U.S. Department of Energy , 1986 .

[5]  Cun-Hui Zhang,et al.  Confidence intervals for low dimensional parameters in high dimensional linear models , 2011, 1110.2563.

[6]  Hui Zou,et al.  CoCoLasso for High-dimensional Error-in-variables Regression , 2015, 1510.07123.

[7]  S. Geer,et al.  On asymptotically optimal confidence regions and tests for high-dimensional models , 2013, 1303.0518.

[8]  Constantine Caramanis,et al.  Regularized EM Algorithms: A Unified Framework and Statistical Guarantees , 2015, NIPS.

[9]  Peter Bühlmann,et al.  Pattern alternating maximization algorithm for missing data in high-dimensional problems , 2014, J. Mach. Learn. Res..

[10]  T. Cai,et al.  A Constrained ℓ1 Minimization Approach to Sparse Precision Matrix Estimation , 2011, 1102.2233.

[11]  A. Tsybakov,et al.  Linear and conic programming estimators in high dimensional errors‐in‐variables models , 2014, 1408.0241.

[12]  Martin J. Wainwright,et al.  Minimax Rates of Estimation for High-Dimensional Linear Regression Over $\ell_q$ -Balls , 2009, IEEE Transactions on Information Theory.

[13]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[14]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[15]  A. Tsybakov,et al.  Sparse recovery under matrix uncertainty , 2008, 0812.2818.

[16]  Po-Ling Loh,et al.  High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity , 2011, NIPS.

[17]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[18]  Tengyuan Liang,et al.  Geometric Inference for General High-Dimensional Linear Inverse Problems , 2014, 1404.4408.

[19]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[20]  Constantine Caramanis,et al.  Noisy and Missing Data Regression: Distribution-Oblivious Support Recovery , 2013, ICML.

[21]  Sham M. Kakade,et al.  A tail inequality for quadratic forms of subgaussian random vectors , 2011, ArXiv.

[22]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[23]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[24]  A. Tsybakov,et al.  Improved Matrix Uncertainty Selector , 2011, 1112.4413.

[25]  L. L. Cam,et al.  Asymptotic Methods In Statistical Decision Theory , 1986 .

[26]  S. R. Jammalamadaka,et al.  Empirical Processes in M-Estimation , 2001 .

[27]  D. L. Donoho,et al.  Compressed sensing , 2006, IEEE Trans. Inf. Theory.

[28]  Po-Ling Loh,et al.  Corrupted and missing predictors: Minimax bounds for high-dimensional linear regression , 2012, 2012 IEEE International Symposium on Information Theory Proceedings.

[29]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[30]  Han Liu,et al.  A Unified Theory of Confidence Regions and Testing for High-Dimensional Estimating Equations , 2015, Statistical Science.

[31]  Alexandre B. Tsybakov,et al.  An $\{l_1,l_2,l_{\infty}\}$-Regularization Approach to High-Dimensional Errors-in-variables Models , 2014 .

[32]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[33]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[34]  Martin J. Wainwright,et al.  Statistical guarantees for the EM algorithm: From population to sample-based analysis , 2014, ArXiv.

[35]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[36]  Zhaoran Wang,et al.  High Dimensional EM Algorithm: Statistical Optimization and Asymptotic Normality , 2015, NIPS.

[37]  D. Botstein,et al.  For Personal Use. Only Reproduce with Permission from the Lancet Publishing Group , 2022 .

[38]  T. Tony Cai,et al.  Confidence intervals for high-dimensional linear regression: Minimax rates and adaptivity , 2015, 1506.05539.

[39]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[40]  J WainwrightMartin Sharp thresholds for high-dimensional and noisy sparsity recovery using l1-constrained quadratic programming (Lasso) , 2009 .

[41]  Adel Javanmard,et al.  Confidence intervals and hypothesis testing for high-dimensional regression , 2013, J. Mach. Learn. Res..

[42]  Francis R. Bach,et al.  Consistency of the group Lasso and multiple kernel learning , 2007, J. Mach. Learn. Res..

[43]  Po-Ling Loh,et al.  Regularized M-estimators with nonconvexity: statistical and algorithmic theory for local optima , 2013, J. Mach. Learn. Res..