A REALITY CHECK FOR DATA SNOOPING BY HALBERT WHITE

Data snooping occurs when a given set of data is used more than once for purposes of inference or model selection. When such data reuse occurs, there is always the possibility that any satisfactory results obtained may simply be due to chance rather than to any merit inherent in the method yielding the results. This problem is practically unavoidable in the analysis of time-series data, as typically only a single history measuring a given phenomenon of interest is available for analysis. It is widely acknowledged by empirical researchers that data snooping is a dangerous practice to be avoided, but in fact it is endemic. The main problem has been a lack of sufficiently simple practical methods capable of assessing the potential dangers of data snooping in a given situation. Our purpose here is to provide such methods by specifying a straightforward procedure for testing the null hypothesis that the best model encountered in a specification search has no predictive superiority over a given benchmark model. This permits data snooping to be undertaken with some degree of confidence that one will not mistake results that could have been generated by chance for genuinely good results.

[1]  S. S. Young,et al.  Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment , 1993 .

[2]  Karl Rihaczek,et al.  1. WHAT IS DATA MINING? , 2019, Data Mining for the Social Sciences.

[3]  Regina Y. Liu Moving blocks jackknife and bootstrap capture weak dependence , 1992 .

[4]  Allan Timmermann,et al.  Dangers of Data-Driven Inference: The Case of Calendar Effects in Stock Returns , 1998 .

[5]  H. White,et al.  Data‐Snooping, Technical Trading Rule Performance, and the Bootstrap , 1999 .

[6]  F. Diebold,et al.  Comparing Predictive Accuracy , 1994, Business Cycles.

[7]  D. Rivers,et al.  Model Selection Tests for Nonlinear Dynamic Models , 2002 .

[8]  P. Hall,et al.  On blocking rules for the bootstrap with dependent data , 1995 .

[9]  D. Cox,et al.  Statistical significance tests. , 1982, British journal of clinical pharmacology.

[10]  H. White,et al.  THE BOOTSTRAP OF THE MEAN FOR DEPENDENT HETEROGENEOUS ARRAYS , 2001, Econometric Theory.

[11]  Thomas Mayer,et al.  Economics as a Hard Science: Realistic Goal or Wishful Thinking? , 1980 .

[12]  Lennart Ljung,et al.  System Identification: Theory for the User , 1987 .

[13]  Y. Hochberg A sharper Bonferroni procedure for multiple tests of significance , 1988 .

[14]  D. Andrews Tests for Parameter Instability and Structural Change with Unknown Change Point , 1993 .

[15]  Edward E. Leamer,et al.  Specification Searches: Ad Hoc Inference with Nonexperimental Data , 1980 .

[16]  H. Künsch The Jackknife and the Bootstrap for General Stationary Observations , 1989 .

[17]  Efstathios Paparoditis,et al.  Tapered block bootstrap , 2001 .

[18]  H. White,et al.  Information criteria for selecting possibly misspecified parametric models , 1996 .

[19]  A. Lo,et al.  Data-Snooping Biases in Tests of Financial Asset Pricing Models , 1989 .

[20]  Bernd Fitzenberger,et al.  The moving blocks bootstrap and robust inference for linear least squares and quantile regressions , 1998 .

[21]  Edward E. Leamer,et al.  Let's Take the Con Out of Econometrics , 1983 .

[22]  C. Granger,et al.  Co-integration and error correction: representation, estimation and testing , 1987 .

[23]  T. Kloek Note on a Large-Sample Result in Specification Analysis , 1975 .

[24]  G. Hommel A comparison of two modified Bonferroni procedures , 1989 .

[25]  David J. Hand,et al.  Data Mining: Statistics and More? , 1998 .

[26]  Paul R. Cohen,et al.  Multiple Comparisons in Induction Algorithms , 2000, Machine Learning.

[27]  N. E. Savin,et al.  The Bonferroni and the Scheffé multiple comparison procedures , 1980 .

[28]  B. LeBaron,et al.  Simple Technical Trading Rules and the Stochastic Properties of Stock Returns , 1992 .

[29]  M. Stone,et al.  Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[30]  Jean-Marie Dufour,et al.  Generalized Predictive Tests and Structural Change Analysis in Econometrics , 1994 .

[31]  B. M. Pötscher Effects of Model Selection on Inference , 1991, Econometric Theory.

[32]  Joseph P. Romano,et al.  A General Resampling Scheme for Triangular Arrays of $\alpha$-Mixing Random Variables with Application to the Problem of Spectral Density Estimation , 1992 .

[33]  H. White Asymptotic theory for econometricians , 1985 .

[34]  Paul Kabaila,et al.  The Effect of Model Selection on Confidence Regions and Prediction Regions , 1995, Econometric Theory.

[35]  D. Andrews Tests for Parameter Instability and Structural Change with Unknown Change Point , 1993 .

[36]  M. Talagrand Sharper Bounds for Gaussian and Empirical Processes , 1994 .

[37]  Norman R. Swanson,et al.  Predictive ability with cointegrated variables , 2001 .

[38]  C. Chatfield Model uncertainty, data mining and statistical inference , 1995 .

[39]  K. West,et al.  Asymptotic Inference about Predictive Ability , 1996 .

[40]  Kevin D. Hoover,et al.  Data mining reconsidered: encompassing and the general-to-specific approach to specification search , 1997 .

[41]  N. L. Johnson,et al.  Linear Statistical Inference and Its Applications , 1966 .

[42]  S. N. Roy On a Heuristic Method of Test Construction and its use in Multivariate Analysis , 1953 .