Confidence intervals and hypothesis testing for high-dimensional regression

Fitting high-dimensional statistical models often requires the use of non-linear parameter estimation procedures. As a consequence, it is generally impossible to obtain an exact characterization of the probability distribution of the parameter estimates. This in turn implies that it is extremely challenging to quantify the uncertainty associated with a certain parameter estimate. Concretely, no commonly accepted procedure exists for computing classical measures of uncertainty and statistical significance as confidence intervals or p- values for these models. We consider here high-dimensional linear regression problem, and propose an efficient algorithm for constructing confidence intervals and p-values. The resulting confidence intervals have nearly optimal size. When testing for the null hypothesis that a certain parameter is vanishing, our method has nearly optimal power. Our approach is based on constructing a 'de-biased' version of regularized M-estimators. The new construction improves over recent work in the field in that it does not assume a special structure on the design matrix. We test our method on synthetic data and a high-throughput genomic data set about riboflavin production rate, made publicly available by Buhlmann et al. (2014).

[1]  R. Tibshirani,et al.  A Study of Error Variance Estimation in Lasso Regression , 2013, 1311.5274.

[2]  Ji Zhu,et al.  Regularized Multivariate Regression for Identifying Master Predictors with Application to Integrative Genomics Study of Breast Cancer. , 2008, The annals of applied statistics.

[3]  Sara van de Geer,et al.  Statistical Theory for High-Dimensional Models , 2014, 1409.8557.

[4]  Peter Bühlmann,et al.  p-Values for High-Dimensional Regression , 2008, 0811.2177.

[5]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[6]  Lu Tian,et al.  A Perturbation Method for Inference on Regularized Regression Estimates , 2011, Journal of the American Statistical Association.

[7]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[8]  Qi Zhang,et al.  Optimality of graphlet screening in high dimensional variable selection , 2012, J. Mach. Learn. Res..

[9]  Xiaoming Huo,et al.  Uncertainty principles and ideal atomic decomposition , 2001, IEEE Trans. Inf. Theory.

[10]  Yichao Wu,et al.  Ultrahigh Dimensional Feature Selection: Beyond The Linear Model , 2009, J. Mach. Learn. Res..

[11]  MontanariAndrea,et al.  Confidence intervals and hypothesis testing for high-dimensional regression , 2014 .

[12]  R. Tibshirani A signicance test for the lasso , 2014 .

[13]  Peter Bühlmann,et al.  High-Dimensional Statistics with a View Toward Applications in Biology , 2014 .

[14]  Jianqing Fan,et al.  Variance estimation using refitted cross‐validation in ultrahigh dimensional regression , 2010, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[15]  E. Candès,et al.  Near-ideal model selection by ℓ1 minimization , 2008, 0801.0345.

[16]  L. Wasserman,et al.  HIGH DIMENSIONAL VARIABLE SELECTION. , 2007, Annals of statistics.

[17]  Jiashun Jin,et al.  Partial Correlation Screening for Estimating Large Precision Matrices, with Applications to Classification , 2014, 1409.3301.

[18]  Adel Javanmard,et al.  Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models , 2013 .

[19]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[20]  Kengo Kato,et al.  Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors , 2012, 1212.6906.

[21]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[22]  Adel Javanmard,et al.  Nearly optimal sample size in hypothesis testing for high-dimensional regression , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[23]  Christopher R. Genovese,et al.  Asymptotic theory for density ridges , 2014, 1406.5663.

[24]  J WainwrightMartin Sharp thresholds for high-dimensional and noisy sparsity recovery using l1-constrained quadratic programming (Lasso) , 2009 .

[25]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[26]  P. Bühlmann Statistical significance in high-dimensional linear models , 2013 .

[27]  S. Geer,et al.  On the conditions used to prove oracle results for the Lasso , 2009, 0910.0722.

[28]  R. Tibshirani,et al.  A SIGNIFICANCE TEST FOR THE LASSO. , 2013, Annals of statistics.

[29]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[30]  Scott Chen,et al.  Examples of basis pursuit , 1995, Optics + Photonics.

[31]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[32]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[33]  S. Geer,et al.  Confidence intervals for high-dimensional inverse covariance estimation , 2014, 1403.6752.

[34]  Patrick Seemann,et al.  Matrix Factorization Techniques for Recommender Systems , 2014 .

[35]  E. Lehmann Testing Statistical Hypotheses , 1960 .

[36]  Shuheng Zhou,et al.  25th Annual Conference on Learning Theory Reconstruction from Anisotropic Random Measurements , 2022 .

[37]  M. Lustig,et al.  Compressed Sensing MRI , 2008, IEEE Signal Processing Magazine.

[38]  S. Geer,et al.  ℓ1-penalization for mixture regression models , 2010, 1202.6046.

[39]  Sara van de Geer,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2011 .

[40]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[41]  Adel Javanmard,et al.  Hypothesis Testing in High-Dimensional Regression Under the Gaussian Random Design Model: Asymptotic Theory , 2013, IEEE Transactions on Information Theory.

[42]  N. S. Barnett,et al.  Private communication , 1969 .

[43]  Peter Buhlmann Statistical significance in high-dimensional linear models , 2012, 1202.1377.

[44]  S. Geer,et al.  On asymptotically optimal confidence regions and tests for high-dimensional models , 2013, 1303.0518.

[45]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[46]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[47]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[48]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[49]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[50]  A. Belloni,et al.  Least Squares After Model Selection in High-Dimensional Sparse Models , 2009, 1001.0188.

[51]  Mehmet Caner,et al.  Asymptotically Honest Confidence Regions for High Dimensional Parameters by the Desparsified Conservative Lasso , 2014, 1410.4208.

[52]  R. Tibshirani,et al.  Adaptive testing for the graphical lasso , 2013, 1307.4765.

[53]  Lee H. Dicker,et al.  Residual variance and the signal-to-noise ratio in high-dimensional linear models , 2012, 1209.0012.

[54]  Cun-Hui Zhang,et al.  Confidence intervals for low dimensional parameters in high dimensional linear models , 2011, 1110.2563.

[55]  R. Tibshirani,et al.  Exact Post-selection Inference for Forward Stepwise and Least Angle Regression , 2014 .

[56]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[57]  Larry Wasserman,et al.  All of Statistics: A Concise Course in Statistical Inference , 2004 .

[58]  Harrison H. Zhou,et al.  Asymptotic normality and optimalities in estimation of large Gaussian graphical models , 2013, 1309.6024.

[59]  Sara van de Geer,et al.  Statistics for High-Dimensional Data , 2011 .

[60]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[61]  Y. Ritov,et al.  Persistence in high-dimensional linear predictor selection and the virtue of overparametrization , 2004 .

[62]  Isaac Dialsingh,et al.  Large-scale inference: empirical Bayes methods for estimation, testing, and prediction , 2012 .

[63]  Larry A. Wasserman,et al.  Estimating Undirected Graphs Under Weak Assumptions , 2013, ArXiv.

[64]  P. Hall,et al.  Permutation tests for equality of distributions in high‐dimensional settings , 2002 .

[65]  Andrea Montanari,et al.  Estimating LASSO Risk and Noise Level , 2013, NIPS.

[66]  Cun-Hui Zhang,et al.  Scaled sparse linear regression , 2011, 1104.4595.

[67]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[68]  Victor Chernozhukov,et al.  Inference on Treatment Effects after Selection Amongst High-Dimensional Controls , 2011 .

[69]  Guang Cheng,et al.  Bootstrapping High Dimensional Time Series , 2014, 1406.1037.

[70]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[71]  Bradley Efron,et al.  Large-scale inference , 2010 .