Valid Post-selection Inference in Assumption-lean Linear Regression

Construction of valid statistical inference for estimators based on data-driven selection has received a lot of attention in the recent times. Berk et al. (2013) is possibly the first work to provide valid inference for Gaussian homoscedastic linear regression with fixed covariates under arbitrary covariate/variable selection. The setting is unrealistic and is extended by Bachoc et al. (2016) by relaxing the distributional assumptions. A major drawback of the aforementioned works is that the construction of valid confidence regions is computationally intensive. In this paper, we first prove that post-selection inference is equivalent to simultaneous inference and then construct valid post-selection confidence regions which are computationally simple. Our construction is based on deterministic inequalities and apply to independent as well as dependent random variables without the requirement of correct distributional assumptions. Finally, we compare the volume of our confidence regions with the existing ones and show that under non-stochastic covariates, our regions are much smaller.

[1]  T. W. Anderson The integral of a symmetric unimodal function over a symmetric convex set and some probability inequalities , 1955 .

[2]  R. Buehler,et al.  Note on a Conditional Property of Student's $t^1$ , 1963 .

[3]  P. J. Huber Robust Estimation of a Location Parameter , 1964 .

[4]  P. J. Huber The behavior of maximum likelihood estimates under nonstandard conditions , 1967 .

[5]  R. Olshen The Conditional Level of the F—Test , 1973 .

[6]  P. Sen Asymptotic Properties of Maximum Likelihood Estimators Based on Conditional Specification , 1979 .

[7]  A. C. Rencher,et al.  Inflation of R2 in Best Subset Regression , 1980 .

[8]  D. Freedman A Note on Screening Regression Equations , 1983 .

[9]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[10]  B. M. Pötscher,et al.  Dynamic Nonlinear Econometric Models , 1997 .

[11]  E. Giné,et al.  Decoupling: From Dependence to Independence , 1998 .

[12]  E. Rio,et al.  Concentration inequalities, large and moderate deviations for self-normalized empirical processes , 2002 .

[13]  N. Hjort,et al.  Frequentist Model Average Estimators , 2003 .

[14]  W. Wu,et al.  Nonlinear system theory: another look at dependence. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Convergence of the optimal M-estimator over a parametric family of M-estimators , 2005 .

[16]  R. Carroll,et al.  An Asymptotic Theory for Model Selection Inference in General Semiparametric Problems , 2007 .

[17]  A. K. Md. Ehsanes Saleh,et al.  Theory of preliminary test and Stein-type estimation with applications , 2006 .

[18]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[19]  Bin Yu,et al.  High-dimensional covariance estimation by minimizing ℓ1-penalized log-determinant divergence , 2008, 0811.3628.

[20]  Gareth M. James,et al.  A generalized Dantzig selector with shrinkage tuning , 2009 .

[21]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[22]  J. Mielniczuk,et al.  A new look at measuring dependence , 2010 .

[23]  A. Tsybakov,et al.  Sparse recovery under matrix uncertainty , 2008, 0812.2818.

[24]  W. Wu,et al.  Asymptotic theory for stationary processes , 2011 .

[25]  Leif D. Nelson,et al.  False-Positive Psychology , 2011, Psychological science.

[26]  Po-Ling Loh,et al.  High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity , 2011, NIPS.

[27]  A. Belloni,et al.  Square-Root Lasso: Pivotal Recovery of Sparse Signals via Conic Programming , 2011 .

[28]  Kengo Kato,et al.  Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors , 2013 .

[29]  A. Buja,et al.  Valid Post-selection Inference Online Appendix , 2013 .

[30]  Weidong Liu,et al.  Probability and moment inequalities under dependence , 2013 .

[31]  W. Wu,et al.  Covariance and precision matrix estimation for high-dimensional time series , 2013, 1401.0993.

[32]  Shie Mannor,et al.  Robust Sparse Regression under Adversarial Corruption , 2013, ICML.

[33]  R. Tibshirani,et al.  Exact Post-Selection Inference for Sequential Regression Procedures , 2014, 1401.3889.

[34]  A. Buja,et al.  Models as Approximations, Part I: A Conspiracy of Nonlinearity and Random Regressors in Linear Regression , 2014, 1404.1578.

[35]  A. Tsybakov,et al.  Linear and conic programming estimators in high dimensional errors‐in‐variables models , 2014, 1408.0241.

[36]  Kengo Kato,et al.  Central limit theorems and bootstrap in high dimensions , 2014, 1412.3661.

[37]  S. Geer,et al.  On higher order isotropy conditions and lower bounds for sparse quadratic forms , 2014, 1405.5995.

[38]  Ruben H. Zamar,et al.  Robust estimation of multivariate location and scatter in the presence of cellwise and casewise contamination , 2014 .

[39]  Jonathan E. Taylor,et al.  MAGIC: a general, powerful and tractable method for selective inference , 2016, 1607.02630.

[40]  Dennis L. Sun,et al.  Exact post-selection inference, with application to the lasso , 2013, 1311.6238.

[41]  A. Rinaldo,et al.  Bootstrapping and sample splitting for high-dimensional, assumption-lean inference , 2016, The Annals of Statistics.

[42]  F. Bachoc,et al.  Uniformly valid confidence intervals post-model-selection , 2016, The Annals of Statistics.

[43]  Y. Wu,et al.  Performance bounds for parameter estimates of high-dimensional linear models with correlated errors , 2016 .

[44]  Ying Cui,et al.  Sparse estimation of high-dimensional correlation matrices , 2016, Comput. Stat. Data Anal..

[45]  Arun K. Kuchibhotla,et al.  A Model Free Perspective for Linear Regression: Uniform-in-model Bounds for Post Selection Inference , 2018 .

[46]  R. Tibshirani,et al.  Uniform asymptotic inference and the bootstrap after model selection , 2015, The Annals of Statistics.

[47]  H. Leeb,et al.  Expected length of post-model-selection confidence intervals conditional on polyhedral constraints , 2018, 1803.01665.

[48]  Arun K. Kuchibhotla,et al.  Moving Beyond Sub-Gaussianity in High-Dimensional Statistics: Applications in Covariance Estimation and Linear Regression , 2018, 1804.02605.

[49]  G. Blanchard,et al.  On the Post Selection Inference constant under Restricted Isometry Properties , 2018, 1804.07566.