Statistically significant forecasting improvements: how much out-of-sample data is likely necessary? ☆

Abstract Testing the out-of-sample forecasting superiority of one model over another requires an a priori partitioning of the data into a model specification/estimation (‘training’) period and a model comparison/evaluation (‘out-of-sample’ or ‘validation’) period. How large a validation period is necessary for a given mean square forecasting error (MSFE) improvement to be statistically significant at the 5% level? If the forecast errors from each model are NIID and these errors are independent of one another, then the 5% critical points for the F distribution provide the answer to this question. But even optimal forecast errors from well-specified models can be serially correlated. And forecast errors are typically substantially crosscorrelated. For such errors, a validation period in excess of 100 observations long is typically necessary in order for a 20% MSFE reduction to be statistically significant at the 5% level. Illustrative applications using actual economic data are given.

[1]  R. Just,et al.  Effects of Exchange Rate Changes on U.S. Agriculture: A Dynamic Analysis , 1981 .

[2]  D. Andrews Heteroskedasticity and Autocorrelation Consistent Covariance Matrix Estimation , 1991 .

[3]  Bruce Mizrach,et al.  The distribution of the Theil U-statistic in bivariate normal populations , 1992 .

[4]  F. Diebold,et al.  Comparing Predictive Accuracy , 1994, Business Cycles.

[5]  Michael W. McCracken Robust out-of-sample inference , 2000 .

[6]  H. Theil Introduction to econometrics , 1978 .

[7]  William H. Press,et al.  Numerical recipes , 1990 .

[8]  K. West,et al.  Asymptotic Inference about Predictive Ability , 1996 .

[9]  George E. P. Box,et al.  Time Series Analysis: Forecasting and Control , 1977 .

[10]  Stephen Taylor,et al.  Forecasting Economic Time Series , 1979 .

[11]  Norman R. Swanson,et al.  An Out of Sample Test for Granger Causality , 2000 .

[12]  Richard Ashley Inflation and the Distribution of Price Changes across Markets: A Causal Analysis , 1981 .

[13]  John Guerard,et al.  Naïve, Arima, Nonparametric, Transfer Function, and VAR Models: A Comparison of Forecasting Performance , 2002 .

[14]  Gwilym M. Jenkins,et al.  Time series analysis, forecasting and control , 1972 .

[15]  W. A. Morgan TEST FOR THE SIGNIFICANCE OF THE DIFFERENCE BETWEEN THE TWO VARIANCES IN A SAMPLE FROM A NORMAL BIVARIATE POPULATION , 1939 .

[16]  Richard Ashley A new technique for postsample model selection and validation , 1998 .

[17]  Richard Schmalensee,et al.  Advertising and aggregate consumption: an analysis of causality , 1980 .

[18]  C. Granger,et al.  Forecasting Economic Time Series. , 1988 .