Time Series Forecasting using RNNs: an Extended Attention Mechanism to Model Periods and Handle Missing Values

In this paper, we study the use of recurrent neural networks (RNNs) for modeling and forecasting time series. We first illustrate the fact that standard sequence-to-sequence RNNs neither capture well periods in time series nor handle well missing values, even though many real life times series are periodic and contain missing values. We then propose an extended attention mechanism that can be deployed on top of any RNN and that is designed to capture periods and make the RNN more robust to missing values. We show the effectiveness of this novel model through extensive experiments with multiple univariate and multivariate datasets.

[1]  Michael I. Jordan Serial Order: A Parallel Distributed Processing Approach , 1997 .

[2]  Chidchanok Lursinsap,et al.  Application of critical support vector machine to time series prediction , 2003, Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS '03..

[3]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[4]  PAUL J. WERBOS,et al.  Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[5]  Jimeng Sun,et al.  RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism , 2016, NIPS.

[6]  Eugen Slutzky Summation of random causes as the source of cyclic processes , 1937 .

[7]  Peter Szolovits,et al.  A Multivariate Timeseries Modeling Approach to Severity of Illness Assessment and Forecasting in ICU with Sparse, Heterogeneous Clinical Data , 2015, AAAI.

[8]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[9]  Andrew Kusiak,et al.  A data-mining approach to predict influent quality , 2013, Environmental Monitoring and Assessment.

[10]  Yoav Freund,et al.  Predicting Performance and Quantifying Corporate Governance Risk for Latin American Adrs and Banks , 2004 .

[11]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[12]  E. Meijering A chronology of interpolation: from ancient astronomy to modern signal and image processing , 2002, Proc. IEEE.

[13]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[14]  Marc'Aurelio Ranzato,et al.  Video (language) modeling: a baseline for generative models of natural videos , 2014, ArXiv.

[15]  Gilbert T. Walker,et al.  On Periodicity in Series of Related Terms , 1931 .

[16]  Harald Haas,et al.  Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication , 2004, Science.

[17]  A. Lapedes,et al.  Nonlinear signal processing using neural networks: Prediction and system modelling , 1987 .

[18]  Matthew Scotch,et al.  Comparison of ARIMA and Random Forest time series models for prediction of avian influenza H5N1 outbreaks , 2014, BMC Bioinformatics.

[19]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[20]  Wei-Chang Yeh,et al.  Forecasting stock markets using wavelet transforms and recurrent neural networks: An integrated system based on artificial bee colony algorithm , 2011, Appl. Soft Comput..

[21]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[22]  Yoshua Bengio,et al.  Attention-Based Models for Speech Recognition , 2015, NIPS.

[23]  John F. MacGregor,et al.  Some Recent Advances in Forecasting and Control , 1968 .

[24]  Ping Li,et al.  Dynamic Least Squares Support Vector Machine , 2006, 2006 6th World Congress on Intelligent Control and Automation.

[25]  G. Yule On a Method of Investigating Periodicities in Disturbed Series, with Special Reference to Wolfer's Sunspot Numbers , 1927 .

[26]  Richard Hull,et al.  Correcting Forecasts with Multifactor Neural Attention , 2016, ICML.

[27]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[28]  Charles Elkan,et al.  Learning to Diagnose with LSTM Recurrent Neural Networks , 2015, ICLR.

[29]  Ah Chung Tsoi,et al.  Noisy Time Series Prediction using Recurrent Neural Networks and Grammatical Inference , 2001, Machine Learning.

[30]  Jürgen Schmidhuber,et al.  Applying LSTM to Time Series Predictable through Time-Window Approaches , 2000, ICANN.

[31]  Guoqiang Peter Zhang,et al.  An investigation of neural networks for linear time-series forecasting , 2001, Comput. Oper. Res..

[32]  Rob J Hyndman,et al.  25 years of time series forecasting , 2006 .

[33]  Les E. Atlas,et al.  Recurrent Networks and NARMA Modeling , 1991, NIPS.

[34]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[35]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[36]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[37]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[38]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[39]  Sven F. Crone,et al.  Advances in forecasting with neural networks? Empirical evidence from the NN3 competition on time series prediction , 2011 .

[40]  P. A. Blight The Analysis of Time Series: An Introduction , 1991 .

[41]  Amy Loutfi,et al.  A review of unsupervised feature learning and deep learning for time-series modeling , 2014, Pattern Recognit. Lett..

[42]  Gunnar Rätsch,et al.  Predicting Time Series with Support Vector Machines , 1997, ICANN.

[43]  Jürgen Schmidhuber,et al.  Learning Precise Timing with LSTM Recurrent Networks , 2003, J. Mach. Learn. Res..

[44]  Gianluca Bontempi,et al.  Machine Learning Strategies for Time Series Forecasting , 2012, eBISS.

[45]  G. C. Tiao,et al.  Modeling Multiple Time Series with Applications , 1981 .

[46]  Dit-Yan Yeung,et al.  Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[47]  S. D. Vito,et al.  CO, NO2 and NOx urban pollution monitoring with on-field calibrated electronic nose by automatic bayesian regularization , 2009 .

[48]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.