论文信息 - Valid Post-selection Inference in Assumption-lean Linear Regression

Valid Post-selection Inference in Assumption-lean Linear Regression

Construction of valid statistical inference for estimators based on data-driven selection has received a lot of attention in the recent times. Berk et al. (2013) is possibly the first work to provide valid inference for Gaussian homoscedastic linear regression with fixed covariates under arbitrary covariate/variable selection. The setting is unrealistic and is extended by Bachoc et al. (2016) by relaxing the distributional assumptions. A major drawback of the aforementioned works is that the construction of valid confidence regions is computationally intensive. In this paper, we first prove that post-selection inference is equivalent to simultaneous inference and then construct valid post-selection confidence regions which are computationally simple. Our construction is based on deterministic inequalities and apply to independent as well as dependent random variables without the requirement of correct distributional assumptions. Finally, we compare the volume of our confidence regions with the existing ones and show that under non-stochastic covariates, our regions are much smaller.

[1] T. W. Anderson. The integral of a symmetric unimodal function over a symmetric convex set and some probability inequalities , 1955 .

[2] R. Buehler,et al. Note on a Conditional Property of Student's $t^1$ , 1963 .

[3] P. J. Huber. Robust Estimation of a Location Parameter , 1964 .

[4] P. J. Huber. The behavior of maximum likelihood estimates under nonstandard conditions , 1967 .

[5] R. Olshen. The Conditional Level of the F—Test , 1973 .

[6] P. Sen. Asymptotic Properties of Maximum Likelihood Estimators Based on Conditional Specification , 1979 .

[7] A. C. Rencher,et al. Inflation of R2 in Best Subset Regression , 1980 .

[8] D. Freedman. A Note on Screening Regression Equations , 1983 .

[9] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[10] B. M. Pötscher,et al. Dynamic Nonlinear Econometric Models , 1997 .

[11] E. Giné,et al. Decoupling: From Dependence to Independence , 1998 .

[12] E. Rio,et al. Concentration inequalities, large and moderate deviations for self-normalized empirical processes , 2002 .

[13] N. Hjort,et al. Frequentist Model Average Estimators , 2003 .

[14] W. Wu. Nonlinear system theory: another look at dependence. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[15] Convergence of the optimal M-estimator over a parametric family of M-estimators , 2005 .

[16] R. Carroll,et al. An Asymptotic Theory for Model Selection Inference in General Semiparametric Problems , 2007 .

[17] A. K. Md. Ehsanes Saleh,et al. Theory of preliminary test and Stein-type estimation with applications , 2006 .

[18] Terence Tao,et al. The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[19] Bin Yu,et al. High-dimensional covariance estimation by minimizing ℓ1-penalized log-determinant divergence , 2008, 0811.3628.

[20] Gareth M. James,et al. A generalized Dantzig selector with shrinkage tuning , 2009 .

[21] Martin J. Wainwright,et al. A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[22] J. Mielniczuk,et al. A new look at measuring dependence , 2010 .

[23] A. Tsybakov,et al. Sparse recovery under matrix uncertainty , 2008, 0812.2818.

[24] W. Wu. Asymptotic theory for stationary processes , 2011 .

[25] Leif D. Nelson,et al. False-Positive Psychology , 2011, Psychological science.

[26] Po-Ling Loh,et al. High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity , 2011, NIPS.