Evaluation Procedures for Forecasting with Spatio-Temporal Data

The amount of available spatio-temporal data has been increasing as large-scale data collection (e.g., from geosensor networks) becomes more prevalent. This has led to an increase in spatio-temporal forecasting applications using geo-referenced time series data motivated by important domains such as environmental monitoring (e.g., air pollution index, forest fire risk prediction). Being able to properly assess the performance of new forecasting approaches is fundamental to achieve progress. However, the dependence between observations that the spatio-temporal context implies, besides being challenging in the modelling step, also raises issues for performance estimation as indicated by previous work. In this paper, we empirically compare several variants of cross-validation (CV) and out-of-sample (OOS) performance estimation procedures that respect data ordering, using both artificially generated and real-world spatio-temporal data sets. Our results show both CV and OOS reporting useful estimates. Further, they suggest that blocking may be useful in addressing CV’s bias to underestimate error. OOS can be very sensitive to test size, as expected, but estimates can be improved by careful management of the temporal dimension in training. Code related to this paper is available at: https://github.com/mrfoliveira/Evaluation-procedures-for-forecasting-with-spatio-temporal-data.

[1]  Luís Torgo,et al.  How to evaluate sentiment classifiers for Twitter time-ordered data? , 2018, PloS one.

[2]  Nita Parekh,et al.  iCopyDAV: Integrated platform for copy number variations—Detection, annotation and visualization , 2018, PloS one.

[3]  Luc Devroye,et al.  Distribution-free performance bounds for potential function rules , 1979, IEEE Trans. Inf. Theory.

[4]  Seymour Geisser,et al.  The Predictive Sample Reuse Method with Applications , 1975 .

[5]  Tomislav Hengl,et al.  Spatio-temporal interpolation of soil water, temperature, and electrical conductivity in 3D + T: The Cook Agronomy Farm data set , 2015 .

[6]  Donato Malerba,et al.  Enhancing Regression Models with Spatio-temporal Indicator Additions , 2013, AI*IA.

[7]  P. Diggle Analysis of Longitudinal Data , 1995 .

[8]  P. Pfeifer,et al.  A Three-Stage Iterative Procedure for Space-Time Modeling , 1980 .

[9]  Andreas Ziegler,et al.  ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R , 2015, 1508.04409.

[10]  Carsten F. Dormann,et al.  Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure , 2017 .

[11]  Yu Zheng,et al.  U-Air: when urban air quality inference meets big data , 2013, KDD.

[12]  Luís Torgo,et al.  Wind speed forecasting using spatio-temporal indicators , 2012, ECAI.

[13]  Edmond Chow,et al.  A cross-validatory method for dependent data , 1994 .

[14]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[15]  José Manuel Benítez,et al.  On the use of cross-validation for time series predictor evaluation , 2012, Inf. Sci..

[16]  Luís Torgo,et al.  Data Mining with R: Learning with Case Studies , 2010 .

[17]  Uwe Haberlandt,et al.  Geostatistical interpolation of hourly precipitation from rain gauges and radar for a large-scale extreme rainfall event , 2007 .

[18]  P. Pfeifer,et al.  A Three-Stage Iterative Procedure for Space-Time Modeling Phillip , 2012 .

[19]  E. Pebesma spacetime: Spatio-Temporal Data in R , 2012 .

[20]  Noel A Cressie,et al.  Spatial modeling of snow water equivalent using covariances estimated from spatial and geomorphic attributes , 1997 .

[21]  Leonard J. Tashman,et al.  Out-of-sample tests of forecasting accuracy: an analysis and review , 2000 .

[22]  M. Akritas,et al.  NonpModelCheck: An R Package for Nonparametric Lack-of-Fit Testing and Variable Selection , 2017 .

[23]  John R Fieberg,et al.  Estimating Population Abundance Using Sightability Models: R SightabilityModel Package , 2012 .

[24]  Sylvain Arlot,et al.  A survey of cross-validation procedures for model selection , 2009, 0907.4728.

[25]  P. Pfeifer,et al.  Stationarity and invertibility regions for low order starma models , 1980 .

[26]  José Manuel Benítez,et al.  On the usefulness of cross-validation for directional forecast evaluation , 2014, Comput. Stat. Data Anal..

[27]  Borja Calvo,et al.  scmamp: Statistical Comparison of Multiple Algorithms in Multiple Problems , 2016, R J..

[28]  Luís Torgo,et al.  A Comparative Study of Performance Estimation Methods for Time Series Forecasting , 2017, 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[29]  Donato Malerba,et al.  Leveraging correlation across space and time to interpolate geophysical data via CoKriging , 2018, Int. J. Geogr. Inf. Sci..

[30]  Tom A. B. Snijders On Cross-Validation for Predictor Evaluation in Time Series , 1988 .

[31]  Richard J. Telford,et al.  Technical note: Estimating unbiased transfer-function performances in spatially structured environments , 2015 .

[32]  Dharmendra S. Modha,et al.  Prequential and Cross-Validated Regression Estimation , 1998, Machine Learning.

[33]  Yuhong Yang,et al.  Nonparametric Regression with Correlated Errors , 2001 .

[34]  Michelangelo Ceci,et al.  Predictive Modeling of PV Energy Production: How to Set Up the Learning Task for a Better Prediction? , 2017, IEEE Transactions on Industrial Informatics.

[35]  E. Masry,et al.  Prequential and cross-validated mixture regression estimation , 1998, Proceedings. 1998 IEEE International Symposium on Information Theory (Cat. No.98CH36252).

[36]  J. Marron,et al.  Comparison of Two Bandwidth Selectors with Dependent Errors , 1991 .

[37]  Jeffrey S. Racine,et al.  Consistent cross-validatory model-selection for dependent data: hv-block cross-validation , 2000 .

[38]  Tomislav Hengl,et al.  Improving performance of spatio-temporal machine learning models using forward feature selection and target-oriented validation , 2018, Environ. Model. Softw..