UNIFORM-IN-SUBMODEL BOUNDS FOR LINEAR REGRESSION IN A MODEL-FREE FRAMEWORK

For the last two decades, high-dimensional data and methods have proliferated throughout the literature. Yet, the classical technique of linear regression has not lost its usefulness in applications. In fact, many high-dimensional estimation techniques can be seen as variable selection that leads to a smaller set of variables (a “submodel”) where classical linear regression applies. We analyze linear regression estimators resulting from model selection by proving estimation error and linear representation bounds uniformly over sets of submodels. Based on deterministic inequalities, our results provide “good” rates when applied to both independent and dependent data. These results are useful in meaningfully interpreting the linear regression estimator obtained after exploring and reducing the variables and also in justifying post-model-selection inference. All results are derived under no model assumptions and are nonasymptotic in nature.

[1]  Arun K. Kuchibhotla,et al.  Valid post-selection inference in model-free linear regression , 2020 .

[2]  O. Papaspiliopoulos High-Dimensional Probability: An Introduction with Applications in Data Science , 2020 .

[3]  Hongzhe Li,et al.  Inference for Individual Mediation Effects and Interventional Effects in Sparse High-Dimensional Causal Graphical Models , 2018, 1809.10652.

[4]  Arun K. Kuchibhotla,et al.  Deterministic Inequalities for Smooth M-estimators , 2018, 1809.05172.

[5]  Arun K. Kuchibhotla,et al.  High-dimensional CLT: Improvements, non-uniform extensions and large deviations , 2018 .

[6]  Christian Hansen,et al.  High-dimensional econometrics and regularized GMM , 2018, 1806.01888.

[7]  G. Blanchard,et al.  On the Post Selection Inference constant under Restricted Isometry Properties , 2018, 1804.07566.

[8]  Arun K. Kuchibhotla,et al.  Moving Beyond Sub-Gaussianity in High-Dimensional Statistics: Applications in Covariance Estimation and Linear Regression , 2018, 1804.02605.

[9]  Arun K. Kuchibhotla,et al.  A Model Free Perspective for Linear Regression: Uniform-in-model Bounds for Post Selection Inference , 2018 .

[10]  O. Catoni,et al.  Dimension-free PAC-Bayesian bounds for matrices, vectors, and linear least squares regression , 2017, 1712.02747.

[11]  Xiaohan Wei,et al.  Estimation of the covariance structure of heavy-tailed distributions , 2017, NIPS.

[12]  A. Rinaldo,et al.  Bootstrapping and sample splitting for high-dimensional, assumption-lean inference , 2016, The Annals of Statistics.

[13]  F. Bachoc,et al.  Uniformly valid confidence intervals post-model-selection , 2016, The Annals of Statistics.

[14]  W. Wu,et al.  Gaussian Approximation for High Dimensional Time Series , 2015, 1508.07036.

[15]  Adel Javanmard,et al.  Debiasing the lasso: Optimal sample size for Gaussian designs , 2015, The Annals of Statistics.

[16]  F. Bachoc,et al.  Valid confidence intervals for post-model-selection predictors , 2014, The Annals of Statistics.

[17]  Guang Cheng,et al.  Bootstrapping High Dimensional Time Series , 2014, 1406.1037.

[18]  Kai Zhang,et al.  Models as Approximations I: Consequences Illustrated with Linear Regression , 2014, Statistical Science.

[19]  Stanislav Minsker Geometric median and robust estimation in Banach spaces , 2013, 1308.1334.

[20]  Weidong Liu,et al.  Probability and moment inequalities under dependence , 2013 .

[21]  Shie Mannor,et al.  Robust Sparse Regression under Adversarial Corruption , 2013, ICML.

[22]  M. Yuan,et al.  Adaptive covariance matrix estimation through block thresholding , 2012, 1211.0459.

[23]  Yaniv Plan,et al.  One‐Bit Compressed Sensing by Linear Programming , 2011, ArXiv.

[24]  Po-Ling Loh,et al.  High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity , 2011, NIPS.

[25]  Nathan Srebro,et al.  Fast-rate and optimistic-rate error bounds for L1-regularized regression , 2011, 1108.0373.

[26]  O. Catoni Challenging the empirical mean and empirical variance: a deviation study , 2010, 1009.2048.

[27]  R. Vershynin How Close is the Sample Covariance Matrix to the Actual Covariance Matrix? , 2010, 1004.3484.

[28]  Weidong Liu,et al.  ASYMPTOTICS OF SPECTRAL DENSITY ESTIMATES , 2009, Econometric Theory.

[29]  Martin J. Wainwright,et al.  Minimax Rates of Estimation for High-Dimensional Linear Regression Over $\ell_q$ -Balls , 2009, IEEE Transactions on Information Theory.

[30]  E. Rio Moment Inequalities for Sums of Dependent Random Variables under Projective Conditions , 2009 .

[31]  A. Belloni,et al.  Least Squares After Model Selection in High-Dimensional Sparse Models , 2009, 1001.0188.

[32]  John F. Monahan,et al.  A Primer on Linear Models , 2008 .

[33]  R. Adamczak A tail inequality for suprema of unbounded empirical processes with applications to Markov chains , 2007, 0709.3110.

[34]  W. Wu,et al.  Nonlinear system theory: another look at dependence. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[35]  B. M. Pötscher,et al.  MODEL SELECTION AND INFERENCE: FACTS AND FICTION , 2005, Econometric Theory.

[36]  H. Leeb,et al.  CAN ONE ESTIMATE THE UNCONDITIONAL DISTRIBUTION OF POST-MODEL-SELECTION ESTIMATORS? , 2003, Econometric Theory.

[37]  H. Leeb,et al.  PERFORMANCE LIMITS FOR ESTIMATORS OF THE RISK OR DISTRIBUTION OF SHRINKAGE-TYPE ESTIMATORS, AND SOME GENERAL LOWER RISK-BOUND RESULTS , 2002, Econometric Theory.

[38]  Joseph P. Romano,et al.  A more general central limit theorem for m-dependent random variables with unbounded m , 2000 .

[39]  B. M. Pötscher,et al.  Dynamic Nonlinear Econometric Models: Asymptotic Theory , 1997 .

[40]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[41]  H. White Asymptotic theory for econometricians , 1985 .

[42]  W. V. Zwet,et al.  A Berry-Esseen bound for symmetric statistics , 1984 .

[43]  H. White A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity , 1980 .

[44]  H. White Using Least Squares to Approximate Unknown Regression Functions , 1980 .

[45]  Soumendu Sundar Mukherjee,et al.  Weak convergence and empirical processes , 2019 .

[46]  Alexander Giessing On High-Dimensional Misspecified Quantile Regression , 2018 .

[47]  Y. Wu,et al.  Performance bounds for parameter estimates of high-dimensional linear models with correlated errors , 2016 .

[48]  Andreas Buja,et al.  Models as Approximations - A Conspiracy of Random Regressors and Model Deviations Against Classical Inference in Regression , 2015 .

[49]  J. Mielniczuk,et al.  A new look at measuring dependence , 2010 .

[50]  B. M. Pötscher,et al.  Dynamic Nonlinear Econometric Models , 1997 .

[51]  Karl Shell,et al.  Economic Theory, Econometrics, and Mathematical Economics , 1993 .

[52]  D. Pollard Empirical Processes: Theory and Applications , 1990 .

[53]  Ing Rj Ser Approximation Theorems of Mathematical Statistics , 1980 .