On the harm that ignoring pretesting can cause

In econometrics the same data set is typically used to select the model and to estimate the parameters in the selected model. In applied econometrics practice, however, one typically acts as if the model had been given a priori, thus ignoring the fact that the estimators are in fact pretest estimators. Hence one assumes incorrectly that the estimator is unbiased, and that the reported variance, conditional on the selected model, is equal to its unconditional variance. In this paper, we -nd the unconditional -rst and second moments of the pretest estimator (in fact, of a more general estimator, the WALS estimator), taking full account of the fact that model selection and estimation are an integrated procedure, and show that the error in not reporting the correct moments can be large. We also show that this error can vary substantially between di2erent model selection procedures. Finally, we ask how the error increases when the number of auxiliary regressors increases.

[1]  James Durbin,et al.  Estimation of Regression Coefficients of Interest when Other Regression Coefficients are of no Interest , 1999 .

[2]  Hannes Leeb,et al.  The Finite-Sample Distribution of Post-Model-Selection Estimators, and Uniform Versus Non-Uniform Approximations , 2000 .

[3]  Rand R. Wilcox,et al.  The statistical implications of pre-test and Stein-rule estimators in econometrics , 1978 .

[4]  B. M. Pötscher Effects of Model Selection on Inference , 1991, Econometric Theory.

[5]  David F. Hendry,et al.  Achievements and challenges in econometric methodology , 2001 .

[6]  C. Chatfield Model uncertainty, data mining and statistical inference , 1995 .

[7]  Ian Witten,et al.  Data Mining , 2000 .

[8]  Michael Lechner,et al.  Nonparametric bounds on employment and income effects of continuous vocational training in East Germany , 1999 .

[9]  D. Giles,et al.  PRE-TEST ESTIMATION AND TESTING IN ECONOMETRICS: RECENT DEVELOPMENTS , 1993 .

[10]  D. Huntsberger,et al.  A Generalization of a Preliminary Testing Procedure for Pooling Data , 1955 .

[11]  J. Magnus,et al.  Estimation of the Mean of a Univariate Normal Distribution with Known Variance , 2002 .

[12]  R. Mittelhammer Restricted least squares, pre-test, ols and stein rule estimators: Risk comparisons under model misspecification , 1984 .

[13]  Jan R. Magnus,et al.  The Traditional Pretest Estimator , 1999 .

[14]  Kevin D. Hoover,et al.  Data mining reconsidered: encompassing and the general-to-specific approach to specification search , 1997 .

[15]  Ping Zhang On the Distributional Properties of Model Selection Criteria , 1992 .

[16]  R. W. Farebrother,et al.  The statistical implications of pre-test and Stein-rule estimators in econometrics , 1978 .

[17]  V. K. Srivastava,et al.  The exact distribution of a least squares regression coefficient estimator after a preliminary t-test , 1993 .

[18]  Frederick Mosteller,et al.  On Pooling Data , 1948 .

[19]  C. Morris,et al.  Non-Optimality of Preliminary-Test Estimators for the Mean of a Multivariate Normal Distribution , 1972 .

[20]  P. Schmidt,et al.  A Note on the Comparison of the Mean Square Error of Inequality Constrained Least Squares and Other Related Estimators , 1982 .

[21]  Karim M. Abadir,et al.  Notation in Econometrics: A Proposal for a Standard , 2002 .

[22]  B. M. Pötscher,et al.  The distribution of estimators after model selection:large and small sample results , 1998 .

[23]  Karl Rihaczek,et al.  1. WHAT IS DATA MINING? , 2019, Data Mining for the Social Sciences.

[24]  C. Roehrig Optimal critical regions for pre-test estimators using a Bayes risk criterion , 1984 .

[25]  P. Sen Asymptotic Properties of Maximum Likelihood Estimators Based on Conditional Specification , 1979 .

[26]  T. A. Bancroft,et al.  On Biases in Estimation Due to the Use of Preliminary Tests of Significance , 1944 .