Models as Approximations — A Conspiracy of Random Regressors and Model Misspecification Against Classical Inference in Regression

Abstract. More than thirty years ago Halbert White inaugurated a “modelrobust” form of statistical inference based on the “sandwich estimator” of standard error. This estimator is known to be “heteroskedasticityconsistent”, but it is less well-known to be “nonlinearity-consistent” as well. Nonlinearity raises fundamental issues because regressors are no longer ancillary, hence can’t be treated as fixed. As a result, (1) the regressor distribution affects the parameters and (2) randomness of the regressors conspires with the nonlinearity to become a source of sampling variability in coefficient estimates. These effects generalize to arbitrary types of regression where regressors have traditionally been treated as ancillary. The generalizations result in a novel notion of misspecification and a re-interpretation of regression parameters as statistical functionals. The cost of a model-robust approach is that the meaning of parameters needs to be rethought and inference needs to be based on model-robust standard errors. For linear OLS, model-trusting standard errors can deviate from “model-robust” standard errors by arbitrary magnitudes. In practice, the two types of standard errors can be compared with a diagnostic test.

[1]  A. Buja,et al.  Statistica Sinica Preprint No : SS-2016-0546 R 1 Title Calibrated Percentile Double Bootstrap For Robust Linear Regression Inference , 2017 .

[2]  Sara van de Geer,et al.  High-dimensional inference in misspecified linear models , 2015, 1503.06426.

[3]  Po-Ling Loh,et al.  Statistical consistency and asymptotic normality for high-dimensional robust M-estimators , 2015, ArXiv.

[4]  Laurie Davies,et al.  Data Analysis and Approximate Models , 2015 .

[5]  D. Donoho,et al.  Variance Breakdown of Huber ( M )-estimators : n / p → m ∈ ( 1 , ∞ ) , 2015 .

[6]  Anthony O'Hagan,et al.  Bayesian inference with misspecified models: Inference about what? , 2013 .

[7]  Stephen G. Walker,et al.  Bayesian inference with misspecified models , 2013 .

[8]  P. Bickel,et al.  Optimal M-estimation in high-dimensional regression , 2013, Proceedings of the National Academy of Sciences.

[9]  A. Buja,et al.  Valid post-selection inference , 2013, 1306.1059.

[10]  Jon Wakefield,et al.  Bayesian sandwich posteriors for pseudo-true parameters , 2012 .

[11]  F. Götze,et al.  RESAMPLING FEWER THAN n OBSERVATIONS: GAINS, LOSSES, AND REMEDIES FOR LOSSES , 2012 .

[12]  L. Wasserman Low Assumptions, High Dimensions , 2011 .

[13]  Thomas Lumley,et al.  Model-Robust Regression and a Bayesian `Sandwich' Estimator , 2010, 1101.1402.

[14]  Donald Ylvisaker,et al.  Counting the Homeless in Los Angeles County , 2008, 0805.2840.

[15]  A. Gelman,et al.  Splitting a Predictor at the Upper Quarter or Third and the Lower Quarter or Third , 2007 .

[16]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[17]  D. Freedman,et al.  On The So-Called “Huber Sandwich Estimator” and “Robust Standard Errors” , 2006 .

[18]  Donald Hedeker,et al.  Longitudinal Data Analysis , 2006 .

[19]  Alastair R. Hall Generalized Method of Moments (Advanced Texts in Econometrics Series, Oxford University Press) , 2005 .

[20]  R. Carroll,et al.  A Note on the Efficiency of Sandwich Covariance Matrix Estimation , 2001 .

[21]  J. S. Long,et al.  Using Heteroscedasticity Consistent Standard Errors in the Linear Regression Model , 2000 .

[22]  B. Everitt,et al.  Analysis of longitudinal data , 1998, British Journal of Psychiatry.

[23]  Adrian Pagan,et al.  Estimation, Inference and Specification Analysis. , 1996 .

[24]  Joseph P. Romano,et al.  Large Sample Confidence Regions Based on Subsamples under Minimal Assumptions , 1994 .

[25]  E. Mammen Bootstrap and Wild Bootstrap for High Dimensional Linear Models , 1993 .

[26]  P. Hall The Bootstrap and Edgeworth Expansion , 1992 .

[27]  M. Berman A theorem of Jacobi and its generalization , 1988 .

[28]  B. Efron The jackknife, the bootstrap, and other resampling plans , 1987 .

[29]  John Law,et al.  Robust Statistics—The Approach Based on Influence Functions , 1986 .

[30]  W. Newey,et al.  A Simple, Positive Semi-Definite, Heteroskedasticity and Autocorrelationconsistent Covariance Matrix , 1986 .

[31]  N. Weber,et al.  The jackknife and heteroskedasticity: Consistent variance estimation for regression models , 1986 .

[32]  H. White,et al.  Some heteroskedasticity-consistent covariance matrix estimators with improved finite sample properties☆ , 1985 .

[33]  R. Welsch,et al.  Efficient Bounded-Influence Regression Estimation , 1982 .

[34]  J. Kent Robust properties of likelihood ratio tests , 1982 .

[35]  D. Freedman Bootstrapping Regression Models , 1981 .

[36]  H. White Consequences and Detection of Misspecified Nonlinear Regression Models , 1981 .

[37]  H. White A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity , 1980 .

[38]  H. White Using Least Squares to Approximate Unknown Regression Functions , 1980 .

[39]  G. Box Robustness in the Strategy of Scientific Model Building. , 1979 .

[40]  D. Hinkley Jackknifing in Unbalanced Situations , 1977 .

[41]  R. Berk,et al.  CONSISTENCY A POSTERIORI , 1970 .

[42]  P. J. Huber The behavior of maximum likelihood estimates under nonstandard conditions , 1967 .

[43]  R. Berk,et al.  Limiting Behavior of Posterior Distributions when the Model is Incorrect , 1966 .

[44]  F. Eicker Asymptotic Normality and Consistency of the Least Squares Estimators for Families of Linear Regressions , 1963 .