Valid post-selection inference in model-free linear regression

S.1. Simulations Continued. The simulation setting in this section is the same as in Section 9. We first describe the reason for using the null situation β0 0p in the model. If β0 is an arbitrary non-zero vector, then, for fixed covariates, XiYi cannot be identically distributed and hence only (asymptotically) conservative inference is possible. In simulations this conservativeness confounds with the simultaneity so that the coverage becomes close to 1 (if not 1). In the main manuscript, we have shown plots comparing our method with Berk et al. (2013) and selective inference. We label our confidence region R̂:n,M (12) as “UPoSI,” the projected confidence region B̂ n,M (28) as “UPoSIBox”, and Berk et al. (2013) as “PoSI.” Tables 1, 2, and 3 show exact numbers for the comparison of our method with Berk et al. (2013). Note that size of each dot in the row plot of Figure 9 indicates the proportion of confidence regions of that volume among same-sized models. In Setting A and B, the confidence region volumes of same-sized models are the same. In Setting C, volumes of confidence regions of Berk and PoSI Box enlarge (hence smaller logpVolq{|M |q if the last covariate is included. Tables 4 and 5 show the numbers for the comparison of our method with selective inference when the selection procedure is forward stepwise and LARS, respectively. Sample splitting is a simple procedure that provides valid inference after selection as discussed in Section 1.3. We stress here that this is valid only for independent observations and that the model selected in the first split half could be different from the one selected in the full data. The comparison results with n 1000, p 500 and selection methods forward stepwise, LARS and BIC are summarized in Figure S.1. For sample splitting we have used the Bonferroni correction to obtain simultaneous inference for all coefficients in a model. Table 6 shows the comparison of our method with sample splitting.

[1]  T. W. Anderson The integral of a symmetric unimodal function over a symmetric convex set and some probability inequalities , 1955 .

[2]  R. Buehler,et al.  Note on a Conditional Property of Student's $t^1$ , 1963 .

[3]  R. Olshen The Conditional Level of the F—Test , 1973 .

[4]  A. C. Rencher,et al.  Inflation of R2 in Best Subset Regression , 1980 .

[5]  D. Freedman A Note on Screening Regression Equations , 1983 .

[6]  Regina Y. Liu,et al.  Using i.i.d. bootstrap inference for general non-i.i.d. models , 1995 .

[7]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[8]  B. M. Pötscher,et al.  Dynamic Nonlinear Econometric Models , 1997 .

[9]  N. Hjort,et al.  Frequentist Model Average Estimators , 2003 .

[10]  W. Wu,et al.  Nonlinear system theory: another look at dependence. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[11]  A. U.S An Asymptotic Theory for Model Selection Inference in General Semiparametric Problems , 2006 .

[12]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[13]  A. Belloni,et al.  Square-Root Lasso: Pivotal Recovery of Sparse Signals via Conic Programming , 2010, 1009.5689.

[14]  J. Mielniczuk,et al.  A new look at measuring dependence , 2010 .

[15]  A. Tsybakov,et al.  Sparse recovery under matrix uncertainty , 2008, 0812.2818.

[16]  Po-Ling Loh,et al.  High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity , 2011, NIPS.

[17]  A. Belloni,et al.  Square-Root Lasso: Pivotal Recovery of Sparse Signals via Conic Programming , 2011 .

[18]  H. Beek F1000Prime recommendation of False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. , 2012 .

[19]  Dennis L. Sun,et al.  Exact post-selection inference, with application to the lasso , 2013, 1311.6238.

[20]  Weidong Liu,et al.  Probability and moment inequalities under dependence , 2013 .

[21]  A. Buja,et al.  Valid post-selection inference , 2013, 1306.1059.

[22]  Shie Mannor,et al.  Robust Sparse Regression under Adversarial Corruption , 2013, ICML.

[23]  R. Tibshirani,et al.  Exact Post-Selection Inference for Sequential Regression Procedures , 2014, 1401.3889.

[24]  A. Buja,et al.  Models as Approximations, Part I: A Conspiracy of Nonlinearity and Random Regressors in Linear Regression , 2014, 1404.1578.

[25]  A. Tsybakov,et al.  Linear and conic programming estimators in high dimensional errors‐in‐variables models , 2014, 1408.0241.

[26]  Kengo Kato,et al.  Central limit theorems and bootstrap in high dimensions , 2014, 1412.3661.

[27]  Dennis L. Sun,et al.  Optimal Inference After Model Selection , 2014, 1410.2597.

[28]  Guang Cheng,et al.  Bootstrapping High Dimensional Time Series , 2014, 1406.1037.

[29]  Jonathan E. Taylor,et al.  MAGIC: a general, powerful and tractable method for selective inference , 2016, 1607.02630.

[30]  A. Rinaldo,et al.  Bootstrapping and sample splitting for high-dimensional, assumption-lean inference , 2016, The Annals of Statistics.

[31]  F. Bachoc,et al.  Uniformly valid confidence intervals post-model-selection , 2016, The Annals of Statistics.

[32]  Y. Wu,et al.  Performance bounds for parameter estimates of high-dimensional linear models with correlated errors , 2016 .

[33]  Ying Cui,et al.  Sparse estimation of high-dimensional correlation matrices , 2016, Comput. Stat. Data Anal..

[34]  Arun K. Kuchibhotla,et al.  A Model Free Perspective for Linear Regression: Uniform-in-model Bounds for Post Selection Inference , 2018 .

[35]  R. Tibshirani,et al.  Uniform asymptotic inference and the bootstrap after model selection , 2015, The Annals of Statistics.

[36]  Arun K. Kuchibhotla,et al.  Model-free Study of Ordinary Least Squares Linear Regression , 2018, 1809.10538.

[37]  H. Leeb,et al.  Expected length of post-model-selection confidence intervals conditional on polyhedral constraints , 2018, 1803.01665.

[38]  Arun K. Kuchibhotla,et al.  Moving Beyond Sub-Gaussianity in High-Dimensional Statistics: Applications in Covariance Estimation and Linear Regression , 2018, 1804.02605.

[39]  G. Blanchard,et al.  On the Post Selection Inference constant under Restricted Isometry Properties , 2018, 1804.07566.

[40]  F. Bachoc,et al.  Valid confidence intervals for post-model-selection predictors , 2014, The Annals of Statistics.

[41]  Cun-Hui Zhang,et al.  Beyond Gaussian approximation: Bootstrap for maxima of sums of independent random vectors , 2017, The Annals of Statistics.