Machine learning strategies for multi-step-ahead time series forecasting

How much electricity is going to be consumed for the next 24 hours? What will be the temperature for the next three days? What will be the number of sales of a certain product for the next few months? Answering these questions often requires forecasting several future observations from a given sequence of historical observations, called a time series. Historically, time series forecasting has been mainly studied in econometrics and statistics. In the last two decades, machine learning, a field that is concerned with the development of algorithms that can automatically learn from data, has become one of the most active areas of predictive modeling research. This success is largely due to the superior performance of machine learning prediction algorithms in many different applications as diverse as natural language processing, speech recognition and spam detection. However, there has been very little research at the intersection of time series forecasting and machine learning.The goal of this dissertation is to narrow this gap by addressing the problem of multi-step-ahead time series forecasting from the perspective of machine learning. To that end, we propose a series of forecasting strategies based on machine learning algorithms.Multi-step-ahead forecasts can be produced recursively by iterating a one-step-ahead model, or directly using a specific model for each horizon. As a first contribution, we conduct an in-depth study to compare recursive and direct forecasts generated with different learning algorithms for different data generating processes. More precisely, we decompose the multi-step mean squared forecast errors into the bias and variance components, and analyze their behavior over the forecast horizon for different time series lengths. The results and observations made in this study then guide us for the development of new forecasting strategies.In particular, we find that choosing between recursive and direct forecasts is not an easy task since it involves a trade-off between bias and estimation variance that depends on many interacting factors, including the learning model, the underlying data generating process, the time series length and the forecast horizon. As a second contribution, we develop multi-stage forecasting strategies that do not treat the recursive and direct strategies as competitors, but seek to combine their best properties. More precisely, the multi-stage strategies generate recursive linear forecasts, and then adjust these forecasts by modeling the multi-step forecast residuals with direct nonlinear models at each horizon, called rectification models. We propose a first multi-stage strategy, that we called the rectify strategy, which estimates the rectification models using the nearest neighbors model. However, because recursive linear forecasts often need small adjustments with real-world time series, we also consider a second multi-stage strategy, called the boost strategy, that estimates the rectification models using gradient boosting algorithms that use so-called weak learners.Generating multi-step forecasts using a different model at each horizon provides a large modeling flexibility. However, selecting these models independently can lead to irregularities in the forecasts that can contribute to increase the forecast variance. The problem is exacerbated with nonlinear machine learning models estimated from short time series. To address this issue, and as a third contribution, we introduce and analyze multi-horizon forecasting strategies that exploit the information contained in other horizons when learning the model for each horizon. In particular, to select the lag order and the hyperparameters of each model, multi-horizon strategies minimize forecast errors over multiple horizons rather than just the horizon of interest.We compare all the proposed strategies with both the recursive and direct strategies. We first apply a bias and variance study, then we evaluate the different strategies using real-world time series from two past forecasting competitions. For the rectify strategy, in addition to avoiding the choice between recursive and direct forecasts, the results demonstrate that it has better, or at least has close performance to, the best of the recursive and direct forecasts in different settings. For the multi-horizon strategies, the results emphasize the decrease in variance compared to single-horizon strategies, especially with linear or weakly nonlinear data generating processes. Overall, we found that the accuracy of multi-step-ahead forecasts based on machine learning algorithms can be significantly improved if an appropriate forecasting strategy is used to select the model parameters and to generate the forecasts.Lastly, as a fourth contribution, we have participated in the Load Forecasting track of the Global Energy Forecasting Competition 2012. The competition involved a hierarchical load forecasting problem where we were required to backcast and forecast hourly loads for a US utility with twenty geographical zones. Our team, TinTin, ranked fifth out of 105 participating teams, and we have been awarded an IEEE Power & Energy Society award.

[1]  Ching-Kang Ing,et al.  MULTISTEP PREDICTION IN AUTOREGRESSIVE PROCESSES , 2003, Econometric Theory.

[2]  Leonard J. Tashman,et al.  Out-of-sample tests of forecasting accuracy: an analysis and review , 2000 .

[3]  Jure Leskovec,et al.  Nonparametric Multi-group Membership Model for Dynamic Networks , 2013, NIPS.

[4]  B. Quenneville,et al.  Seasonal adjustment with the X-11 method , 2001 .

[5]  Robert Lund,et al.  The ARMA alphabet soup: A tour of ARMA model variants , 2010 .

[6]  Cosma Rohilla Shalizi,et al.  Generalization error bounds for stationary autoregressive models , 2011, ArXiv.

[7]  T. Teräsvirta Specification, Estimation, and Evaluation of Smooth Transition Autoregressive Models , 1994 .

[8]  Rob J. Hyndman,et al.  Boosting multi-step autoregressive forecasts , 2014, ICML.

[9]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[10]  Shiliang Sun,et al.  Neural network multitask learning for traffic flow forecasting , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[11]  Sylvain Arlot,et al.  A survey of cross-validation procedures for model selection , 2009, 0907.4728.

[12]  J. Suykens,et al.  Time Series Prediction using LS-SVMs , 2008 .

[13]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[14]  H. Tong,et al.  Threshold Autoregression, Limit Cycles and Cyclical Data , 1980 .

[15]  C. Granger,et al.  Forecasting from non-linear models in practice , 1994 .

[16]  Howell Tong,et al.  Threshold Models in Time Series Analysis-30 Years On , 2011 .

[17]  Averill M. Law,et al.  Simulation modelling and analysis , 1991 .

[18]  J. Stock,et al.  A Comparison of Linear and Nonlinear Univariate Models for Forecasting Macroeconomic Time Series , 1998 .

[19]  Rob J Hyndman,et al.  Forecasting with Exponential Smoothing: The State Space Approach , 2008 .

[20]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[21]  Leonid Kruglyak,et al.  Rise of the Machines , 2008, PLoS genetics.

[22]  Antti Sorjamaa,et al.  Multiple-output modeling for multi-step-ahead time series forecasting , 2010, Neurocomputing.

[23]  P. Bühlmann,et al.  Splines for financial volatility , 2007 .

[24]  R. J. Bhansali,et al.  Multi‐Step Forecasting , 2007 .

[25]  Concha Bielza,et al.  Machine Learning in Bioinformatics , 2008, Encyclopedia of Database Systems.

[26]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[27]  Bogdan Gabrys,et al.  Meta-learning for time series forecasting and forecast combination , 2010, Neurocomputing.

[28]  Gerhard Tutz,et al.  Boosting techniques for nonlinear time series models , 2012 .

[29]  Stuart Barber,et al.  All of Statistics: a Concise Course in Statistical Inference , 2005 .

[30]  Amir F. Atiya,et al.  Forecast combinations of computational intelligence and linear models for the NN5 time series forecasting competition , 2011 .

[31]  M. Small,et al.  Towards long-term prediction , 2000 .

[32]  Serena Ng,et al.  Boosting diffusion indices , 2009 .

[33]  Ching-Kang Ing,et al.  Selecting optimal multistep predictors for autoregressive processes of unknown order , 2004, math/0406433.

[34]  H. Akaike Fitting autoregressive models for prediction , 1969 .

[35]  Arye Nehorai,et al.  On multistep prediction error methods for time series models , 1989 .

[36]  J. Shao Linear Model Selection by Cross-validation , 1993 .

[37]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machines , 2002 .

[38]  James McNames,et al.  A Nearest Trajectory Strategy for Time Series Prediction , 2000 .

[39]  Douglas Kline,et al.  Methods for Multi-Step Time Series Forecasting Neural Networks , 2004 .

[40]  A. Izenman Reduced-rank regression for the multivariate linear model , 1975 .

[41]  L. Kilian,et al.  How Useful Is Bagging in Forecasting Economic Time Series? A Case Study of U.S. Consumer Price Inflation , 2008 .

[42]  D. Nolan,et al.  DATA‐DEPENDENT ESTIMATION OF PREDICTION FUNCTIONS , 1992 .

[43]  Amir F. Atiya,et al.  A comparison between neural-network forecasting techniques-case study: river flow forecasting , 1999, IEEE Trans. Neural Networks.

[44]  G. C. Tiao,et al.  Robustness of maximum likelihood estimates for multi-step predictions: The exponential smoothing case , 1993 .

[45]  Yan Liu,et al.  Learning Temporal Causal Graphs for Relational Time-Series Analysis , 2010, ICML.

[46]  Shu-Ing Liu,et al.  Model selection for multiperiod forecasts , 1996 .

[47]  David M. Allen,et al.  The Relationship Between Variable Selection and Data Agumentation and a Method for Prediction , 1974 .

[48]  Paul H. C. Eilers,et al.  Flexible smoothing with B-splines and penalties , 1996 .

[49]  Damien Fay,et al.  24-h Electrical Load Data - a Sequential or Partitioned Time Series? , 2003, Neurocomputing.

[50]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[51]  S. Fan,et al.  Short-term load forecasting based on an adaptive hybrid method , 2006, IEEE Transactions on Power Systems.

[52]  Gwilym M. Jenkins,et al.  Time series analysis, forecasting and control , 1971 .

[53]  Dobrivoje Popovic,et al.  Computational Intelligence in Time Series Forecasting: Theory and Engineering Applications (Advances in Industrial Control) , 2005 .

[54]  Daniela M. Witten,et al.  An Introduction to Statistical Learning: with Applications in R , 2013 .

[55]  E. Walter,et al.  Multi-Output Suppport Vector Regression , 2003 .

[56]  Guoqiang Peter Zhang,et al.  An empirical investigation of bias and variance in time series forecasting: modeling considerations and error evaluation , 2003, IEEE Trans. Neural Networks.

[57]  Michael P. Clements,et al.  FORECASTING ECONOMIC TIME SERIES , 2000, Econometric Theory.

[58]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[59]  Michael Y. Hu,et al.  Forecasting with artificial neural networks: The state of the art , 1997 .

[60]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[61]  Klaus Nordhausen,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition by Trevor Hastie, Robert Tibshirani, Jerome Friedman , 2009 .

[62]  Rob J Hyndman,et al.  Another look at measures of forecast accuracy , 2006 .

[63]  Amaury Lendasse,et al.  Long-term prediction of time series by combining direct and MIMO strategies , 2009, 2009 International Joint Conference on Neural Networks.

[64]  J. W. Taylor,et al.  Short-term electricity demand forecasting using double seasonal exponential smoothing , 2003, J. Oper. Res. Soc..

[65]  Rob J. Hyndman,et al.  Recursive and direct multi-step forecasting: the best of both worlds , 2012 .

[66]  David Barber,et al.  Bayesian Time Series Models , 2011 .

[67]  José Manuel Benítez,et al.  On the use of cross-validation for time series predictor evaluation , 2012, Inf. Sci..

[68]  José M. Matías Multi-output Nonparametric Regression , 2005, EPIA.

[69]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[70]  R. M. Shereef,et al.  Review of demand response under smart grid paradigm , 2011, ISGT2011-India.

[71]  Marina Theodosiou,et al.  Forecasting monthly and quarterly time series using STL decomposition , 2011 .

[72]  Ian Gorton,et al.  Large-Scale Data Challenges in Future Power Grids , 2013, 2013 IEEE Seventh International Symposium on Service-Oriented System Engineering.

[73]  Spyros Makridakis,et al.  The M3-Competition: results, conclusions and implications , 2000 .

[74]  M. Neves FORECASTING TIME SERIES WITH BOOT . EXPOS PROCEDURE , 2009 .

[75]  David Mease,et al.  Evidence Contrary to the Statistical View of Boosting , 2008, J. Mach. Learn. Res..

[76]  Diana Adler Non Linear Time Series A Dynamical System Approach , 2016 .

[77]  Lutgarde M. C. Buydens,et al.  Using support vector machines for time series prediction , 2003 .

[78]  Edmond Chow,et al.  A cross-validatory method for dependent data , 1994 .

[79]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[80]  P. McSharry,et al.  Probabilistic forecasts of the magnitude and timing of peak electricity demand , 2005, IEEE Transactions on Power Systems.

[81]  Timo Teräsvirta,et al.  Forecasting economic variables with nonlinear models , 2005 .

[82]  Naonori Ueda,et al.  Generalization error of ensemble estimators , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[83]  Torsten Hothorn,et al.  Model-based Boosting 2.0 , 2010, J. Mach. Learn. Res..

[84]  C. J. Stone,et al.  Additive Regression and Other Nonparametric Models , 1985 .

[85]  Badi H. Baltagi,et al.  A companion to theoretical econometrics , 2003 .

[86]  Kenji Fukumizu,et al.  Statistical Consistency of Kernel Canonical Correlation Analysis , 2007 .

[87]  Gianluca Bontempi,et al.  Machine Learning Strategies for Time Series Forecasting , 2012, eBISS.

[88]  David F. Hendry,et al.  Open-Model Forecast-Error Taxonomies , 2013 .

[89]  In-Bong Kang,et al.  Multi-period forecasting using different models for different horizons: an application to U.S. economic time series data , 2003 .

[90]  B. Peter,et al.  BOOSTING FOR HIGH-MULTIVARIATE RESPONSES IN HIGH-DIMENSIONAL LINEAR REGRESSION , 2006 .

[91]  Nils Lid Hjort,et al.  Model Selection and Model Averaging , 2001 .

[92]  Johan A. K. Suykens,et al.  The K.U.Leuven competition data: a challenge for advanced neural network techniques , 2000, ESANN.

[93]  Mark W. Watson,et al.  Chapter 10 Forecasting with Many Predictors , 2006 .

[94]  Galit Shmueli,et al.  To Explain or To Predict? , 2010 .

[95]  Amir F. Atiya,et al.  A review and comparison of strategies for multi-step ahead time series forecasting based on the NN5 forecasting competition , 2011, Expert Syst. Appl..

[96]  G. C. Tiao,et al.  Some advances in non‐linear and adaptive modelling in time‐series , 1994 .

[97]  Hsuan-Tien Lin,et al.  Learning From Data , 2012 .

[98]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[99]  Qiong Shen,et al.  Financial Time Series Forecasting Using Support Vector Machine , 2014, 2014 Tenth International Conference on Computational Intelligence and Security.

[100]  Y. Abu-Mostafa Machines that Think for Themselves , 2012 .

[101]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[102]  Andrew W. Moore,et al.  Locally Weighted Learning , 1997, Artificial Intelligence Review.

[103]  R. J. Bhansali,et al.  DIRECT AUTOREGRESSIVE PREDICTORS FOR MULTISTEP PREDICTION: ORDER SELECTION AND PERFORMANCE RELATIVE TO THE PLUG IN PREDICTORS , 1997 .

[104]  M. Stone An Asymptotic Equivalence of Choice of Model by Cross‐Validation and Akaike's Criterion , 1977 .

[105]  Douglas W. Nychka,et al.  A personal overview of non-linear time series analysis from a chaos perspective. Commentary , 1995 .

[106]  Ruey S. Tsay,et al.  Co‐integration constraint and forecasting: An empirical examination , 1996 .

[107]  E. M. Wright,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[108]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[109]  E. Lehmann Testing Statistical Hypotheses , 1960 .

[110]  Z. Q. John Lu,et al.  Nonlinear Time Series: Nonparametric and Parametric Methods , 2004, Technometrics.

[111]  J. Shao AN ASYMPTOTIC THEORY FOR LINEAR MODEL SELECTION , 1997 .

[112]  Anders Bredahl Kock,et al.  Forecasting with Nonlinear Time Series Models , 2010 .

[113]  T. Teräsvirta,et al.  Characterizing Nonlinearities in Business Cycles Using Smooth Transition Autoregressive Models , 1992 .

[114]  R. J. Bhansali,et al.  Asymptotically efficient autoregressive model selection for multistep prediction , 1996 .

[115]  Datong Chen,et al.  Forecasting high-dimensional data , 2010, SIGMOD Conference.

[116]  H. Kantz,et al.  Nonlinear time series analysis , 1997 .

[117]  Amaury Lendasse,et al.  Mutual Information and k-Nearest Neighbors Approximator for Time Series Prediction , 2005, ICANN.

[118]  G. Molenberghs,et al.  Past, Present, and Future of Statistical Science , 2014 .