A Comparative Study of Performance Estimation Methods for Time Series Forecasting

Performance estimation denotes a task of estimating the loss that a predictive model will incur on unseen data. These procedures are part of the pipeline in every machine learning task and are used for assessing the overall generalisation ability of models. In this paper we address the application of these methods to time series forecasting tasks. For independent and identically distributed data the most common approach is cross-validation. However, the dependency among observations in time series raises some caveats about the most appropriate way to estimate performance in these datasets and currently there is no settled way to do so. We compare different variants of cross-validation and different variants of out-of-sample approaches using two case studies: One with 53 real-world time series and another with three synthetic time series. Results show noticeable differences in the performance estimation methods in the two scenarios. In particular, empirical experiments suggest that cross-validation approaches can be applied to stationary synthetic time series. However, in real-world scenarios the most accurate estimates are produced by the out-of-sample methods, which preserve the temporal order of observations.

[1]  Leonard J. Tashman,et al.  Out-of-sample tests of forecasting accuracy: an analysis and review , 2000 .

[2]  Oguz Akbilgic,et al.  A novel Hybrid RBF Neural Networks model as a forecaster , 2013, Statistics and Computing.

[3]  Dharmendra S. Modha,et al.  Prequential and Cross-Validated Regression Estimation , 1998, Machine Learning.

[4]  José Manuel Benítez,et al.  On the use of cross-validation for time series predictor evaluation , 2012, Inf. Sci..

[5]  Irena Koprinska,et al.  Yearly and seasonal models for electricity load forecasting , 2011, The 2011 International Joint Conference on Neural Networks.

[6]  Richard A. Davis,et al.  Time Series: Theory and Methods , 2013 .

[7]  José Manuel Benítez,et al.  On the usefulness of cross-validation for directional forecast evaluation , 2014, Comput. Stat. Data Anal..

[8]  H. Abarbanel,et al.  Determining embedding dimension for phase-space reconstruction using a geometrical construction. , 1992, Physical review. A, Atomic, molecular, and optical physics.

[9]  Luís Torgo An Infra-Structure for Performance Estimation and Experimental Comparison of Predictive Models in R , 2014, ArXiv.

[10]  A. McQuarrie,et al.  Regression and Time Series Model Selection , 1998 .

[11]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[12]  D. Rand,et al.  Dynamical Systems and Turbulence, Warwick 1980 : proceedings of a symposium held at the University of Warwick 1979/80 , 1981 .

[13]  Robert Fildes,et al.  Evaluation of Aggregate and Individual Forecast Method Selection Rules , 1989 .

[14]  Rob J. Hyndman,et al.  A Note on the Validity of Cross-Validation for Evaluating Time Series Prediction , 2015 .

[15]  Luís Torgo,et al.  Arbitrated Ensemble for Solar Radiation Forecasting , 2017, IWANN.

[16]  Jeffrey S. Racine,et al.  Consistent cross-validatory model-selection for dependent data: hv-block cross-validation , 2000 .

[17]  M. Stone Cross-validation and multinomial prediction , 1974 .

[18]  S. Wilcox,et al.  Solar Radiation Monitoring Station (SoRMS): Humboldt State University, Arcata, California (Data) , 2007 .

[19]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[20]  Tom A. B. Snijders On Cross-Validation for Predictor Evaluation in Time Series , 1988 .

[21]  Luís Torgo,et al.  Data Mining with R: Learning with Case Studies , 2010 .

[22]  J. Hart,et al.  Kernel Regression Estimation Using Repeated Measurements Data , 1986 .

[23]  Albert Bifet,et al.  DATA STREAM MINING A Practical Approach , 2009 .

[24]  Sylvain Arlot,et al.  A survey of cross-validation procedures for model selection , 2009, 0907.4728.

[25]  José Manuel Benítez,et al.  Forecaster performance evaluation with cross-validation and variants , 2011, 2011 11th International Conference on Intelligent Systems Design and Applications.

[26]  H. Tong,et al.  Threshold time series modelling of two Icelandic riverflow systems , 1985 .

[27]  Hadi Fanaee-T,et al.  Event labeling combining ensemble detectors and background knowledge , 2014, Progress in Artificial Intelligence.