Models as Approximations — A Conspiracy of Random Predictors and Model Violations Against Classical Inference in Regression

Abstract. We review and interpret the early insights of Halbert White who over thirty years ago inaugurated a form of statistical inference for regression models that is asymptotically correct even under “model misspecification,” that is, under the assumption that models are approximations rather than generative truths. This form of inference, which is pervasive in econometrics, relies on the “sandwich estimator” of standard error. Whereas linear models theory in statistics assumes models to be true and predictors to be fixed, White’s theory permits models to be approximate and predictors to be random. Careful reading of his work shows that the deepest consequences for statistical inference arise from a synergy — a “conspiracy” — of nonlinearity and randomness of the predictors which invalidates the ancillarity argument that justifies conditioning on the predictors when they are random. An asymptotic comparison of standard error estimates from linear models theory and White’s asymptotic theory shows that discrepancies between them can be of arbitrary magnitude. In practice, when there exist discrepancies, linear models theory tends to be too liberal but occasionally it can be too conservative as well. A valid alternative to the sandwich estimator is provided by the “pairs bootstrap”; in fact, the sandwich estimator can be shown to be a limiting case of the pairs bootstrap. Finally we give

[1]  Donald Ylvisaker,et al.  Counting the Homeless in Los Angeles County , 2008, 0805.2840.

[2]  A. Gelman,et al.  Splitting a Predictor at the Upper Quarter or Third and the Lower Quarter or Third , 2007 .

[3]  D. Freedman,et al.  On The So-Called “Huber Sandwich Estimator” and “Robust Standard Errors” , 2006 .

[4]  J. Aldrich Fisher and Regression , 2005 .

[5]  R. Carroll,et al.  A Note on the Efficiency of Sandwich Covariance Matrix Estimation , 2001 .

[6]  E. Mammen Bootstrap and Wild Bootstrap for High Dimensional Linear Models , 1993 .

[7]  M. Berman A theorem of Jacobi and its generalization , 1988 .

[8]  H. White,et al.  Some heteroskedasticity-consistent covariance matrix estimators with improved finite sample properties☆ , 1985 .

[9]  B. Efron,et al.  The Jackknife: The Bootstrap and Other Resampling Plans. , 1983 .

[10]  H. White Consequences and Detection of Misspecified Nonlinear Regression Models , 1981 .

[11]  C. D. Beaumont,et al.  Regression Diagnostics — Identifying Influential Data and Sources of Collinearity , 1981 .

[12]  H. White A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity , 1980 .

[13]  H. White Using Least Squares to Approximate Unknown Regression Functions , 1980 .

[14]  J. Hausman Specification tests in econometrics , 1978 .

[15]  D. Hinkley Jackknifing in Unbalanced Situations , 1977 .

[16]  A. Young Mostly Harmless Econometrics , 2012 .

[17]  C. F. Wu JACKKNIFE , BOOTSTRAP AND OTHER RESAMPLING METHODS IN REGRESSION ANALYSIS ' BY , 2008 .

[18]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[19]  J. Fox Bootstrapping Regression Models , 2002 .

[20]  Stefan Sperlich,et al.  Generalized Additive Models , 2014 .

[21]  B. Efron The jackknife, the bootstrap, and other resampling plans , 1987 .

[22]  N. Weber,et al.  The jackknife and heteroskedasticity: Consistent variance estimation for regression models , 1986 .

[23]  H. White Maximum Likelihood Estimation of Misspecified Models , 1982 .

[24]  G. Box Robustness in the Strategy of Scientific Model Building. , 1979 .

[25]  P. J. Huber The behavior of maximum likelihood estimates under nonstandard conditions , 1967 .

[26]  M. Kendall Theoretical Statistics , 1956, Nature.