Markov cross-validation for time series model evaluations

Cross-validation (CV) is a simple and universal tool to estimate generalization ability, however, existing CVs do not work well for periodicity, overlapping or correlation of series. The corresponding three criteria aimed at describing these properties are presented. Based on them, we put forward a novel Markov cross-validation (M-CV), whose data partition can be seen as a Markov process. The partition ensures that samples in each subset are neither too close nor too far. In doing so, overfitting model or information loss of series, which may result in underestimation or overestimation of the error, can be avoided. Furthermore, subsets from M-CV partition could well represent the original series, and it may be extended to time series or stream data sampling. Theoretical analysis shows that M-CV is the unique one which meets all of above criteria among current CVs. In addition, the error estimation on subsets is proved to have less variance than that on original series, therefore it ensures the stability of M-CV. Experimental results demonstrate that the proposed M-CV has lower bias, variance and time consumption than other CVs.

[1]  Yoshua Bengio,et al.  No Unbiased Estimator of the Variance of K-Fold Cross-Validation , 2003, J. Mach. Learn. Res..

[2]  José Antonio Lozano,et al.  Sensitivity Analysis of k-Fold Cross Validation in Prediction Error Estimation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Steven Salzberg,et al.  On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach , 1997, Data Mining and Knowledge Discovery.

[4]  Yuhong Yang,et al.  Nonparametric Regression with Correlated Errors , 2001 .

[5]  José Manuel Benítez,et al.  On the use of cross-validation for time series predictor evaluation , 2012, Inf. Sci..

[6]  Rob J. Hyndman,et al.  A Note on the Validity of Cross-Validation for Evaluating Time Series Prediction , 2015 .

[7]  José Antonio Lozano,et al.  A general framework for the statistical analysis of the sources of variance for classification error estimators , 2013, Pattern Recognit..

[8]  José Manuel Benítez,et al.  On the usefulness of cross-validation for directional forecast evaluation , 2014, Comput. Stat. Data Anal..

[9]  Zhongyuan Lai,et al.  B-spline-based shape coding with accurate distortion measurement using analytical model , 2015, Neurocomputing.

[10]  Amaury Lendasse,et al.  Adaptive Kernel Smoothing Regression for Spatio-Temporal Environmental Datasets , 2012, ESANN.

[11]  Ji-Hyun Kim,et al.  Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap , 2009, Comput. Stat. Data Anal..

[12]  Ethem Alpaydın,et al.  Combined 5 x 2 cv F Test for Comparing Supervised Classification Learning Algorithms , 1999, Neural Comput..

[13]  J. Marron,et al.  Comparison of Two Bandwidth Selectors with Dependent Errors , 1991 .

[14]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[15]  Jeffrey S. Racine,et al.  Consistent cross-validatory model-selection for dependent data: hv-block cross-validation , 2000 .

[16]  Alina A. von Davier,et al.  Cross-Validation , 2014 .

[17]  Agostino Di Ciaccio,et al.  Computational Statistics and Data Analysis Measuring the Prediction Error. a Comparison of Cross-validation, Bootstrap and Covariance Penalty Methods , 2022 .

[18]  Robert V. Brill,et al.  Applied Statistics and Probability for Engineers , 2004, Technometrics.

[19]  David Ruppert,et al.  Fitting a Bivariate Additive Model by Local Polynomial Regression , 1997 .

[20]  Eamonn J. Keogh,et al.  Experimental comparison of representation methods and distance measures for time series data , 2010, Data Mining and Knowledge Discovery.

[21]  Peter Malec,et al.  Nonparametric Kernel Density Estimation Near the Boundary , 2013, Comput. Stat. Data Anal..

[22]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[23]  Robert M. Kunst Cross Validation of Prediction Models for Seasonal Time Series by Parametric Bootstrapping , 2016 .