A Note on the Validity of Cross-Validation for Evaluating Time Series Prediction

One of the most widely used standard procedures for model evaluation in classification and regression is K-fold cross-validation (CV). However, when it comes to time series forecasting, because of the inherent serial correlation and potential non-stationarity of the data, its application is not straightforward and often omitted by practitioners in favor of an out-of-sample (OOS) evaluation. In this paper, we show that the particular setup in which time series forecasting is usually performed using Machine Learning methods renders the use of standard K-fold CV possible. We present theoretical insights supporting our arguments. Furthermore, we present a simulation study where we show empirically that K-fold CV performs favourably compared to both OOS evaluation and other time-series-specific techniques such as non-dependent cross-validation.

[1]  Sylvain Arlot,et al.  A survey of cross-validation procedures for model selection , 2009, 0907.4728.

[2]  Edmond Chow,et al.  A cross-validatory method for dependent data , 1994 .

[3]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[4]  Robert M. Kunst Cross Validation of Prediction Models for Seasonal Time Series by Parametric Bootstrapping , 2016 .

[5]  Bogdan Gabrys,et al.  Density-Preserving Sampling: Robust and Efficient Alternative to Cross-Validation for Error Estimation , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[6]  Richard A. Davis,et al.  Time Series: Theory and Methods , 2013 .

[7]  Wolfgang Härdle,et al.  Nonparametric Curve Estimation from Time Series , 1989 .

[8]  José Manuel Benítez,et al.  On the usefulness of cross-validation for directional forecast evaluation , 2014, Comput. Stat. Data Anal..

[9]  Yuhong Yang,et al.  Nonparametric Regression with Correlated Errors , 2001 .

[10]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[11]  Agostino Di Ciaccio,et al.  Computational Statistics and Data Analysis Measuring the Prediction Error. a Comparison of Cross-validation, Bootstrap and Covariance Penalty Methods , 2022 .

[12]  Jeffrey S. Racine,et al.  Consistent cross-validatory model-selection for dependent data: hv-block cross-validation , 2000 .

[13]  D. Nolan,et al.  DATA‐DEPENDENT ESTIMATION OF PREDICTION FUNCTIONS , 1992 .

[14]  José Manuel Benítez,et al.  On the use of cross-validation for time series predictor evaluation , 2012, Inf. Sci..

[15]  Francisco Herrera,et al.  Study on the Impact of Partition-Induced Dataset Shift on $k$-Fold Cross-Validation , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[16]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[17]  Rob J Hyndman,et al.  Forecasting with Exponential Smoothing: The State Space Approach , 2008 .

[18]  A. McQuarrie,et al.  Regression and Time Series Model Selection , 1998 .