Multistage attention network for multivariate time series prediction

Abstract The deep learning model has been used to predict the variation rule of the target series of multivariate time series data. Based on the attention mechanism, the influence information of multiple non-predictive time series on target series in different time stages is processed as the same weight in the previous studies. However, on real-world datasets, multiple non-predictive time series has different influence (such as different mutation information) on target series in different time stages. Therefore, a new multistage attention network is designed to capture the different influence. The model is mainly composed of the influential attention mechanism and temporal attention mechanism. In the influential attention mechanism, the same and different time stage attention mechanisms are used to capture the influence information of different non-predictive time series on the target series over time. In the temporal attention mechanism, the variation law of data can be captured over time. Besides, the prediction performance of proposed model on two different real-world multivariate time series datasets is comprehensively evaluated. The results show that, the prediction performance of the proposed model beat all baseline models and SOTA models. In a word, the multistage attention network model can effectively learn the information of the influence of different non-predictive time series on the target series in different time stages in the historical data.

[1]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[2]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[3]  Ruibin Bai,et al.  Freight Vehicle Travel Time Prediction Using Gradient Boosting Regression Tree , 2016, 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA).

[4]  Fenglong Ma,et al.  MuVAN: A Multi-view Attention Network for Multivariate Temporal Data , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[5]  Xuan Liang,et al.  Assessing Beijing's PM2.5 pollution: severity, weather impact, APEC and winter heating , 2015, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[6]  Jeffrey L. Elman,et al.  Distributed Representations, Simple Recurrent Networks, and Grammatical Structure , 1991, Mach. Learn..

[7]  Jun Hu,et al.  An Adaptive Optimization Algorithm Based on Hybrid Power and Multidimensional Update Strategy , 2019, IEEE Access.

[8]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[9]  Yanru Zhang,et al.  Using an ARIMA-GARCH Modeling Approach to Improve Subway Short-Term Ridership Forecasting Accounting for Dynamic Volatility , 2018, IEEE Transactions on Intelligent Transportation Systems.

[10]  Zhisong Pan,et al.  Online learning for vector autoregressive moving-average time series prediction , 2018, Neurocomputing.

[11]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[12]  Garrison W. Cottrell,et al.  Substructure Vibration NARX Neural Network Approach for Statistical Damage Inference , 2013 .

[13]  Garrison W. Cottrell,et al.  A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction , 2017, IJCAI.

[14]  Nadia Nedjah,et al.  A deep increasing-decreasing-linear neural network for financial time series prediction , 2019, Neurocomputing.

[15]  Jun Hu,et al.  Transformation-gated LSTM: efficient capture of short-term mutation dependencies for multivariate time series prediction tasks , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).

[16]  Mariette Awad,et al.  On extreme learning machines in sequential and time series prediction: A non-iterative and approximate training algorithm for recurrent neural networks , 2019, Neurocomputing.

[17]  Tao Mei,et al.  Multi-level Attention Networks for Visual Question Answering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Charu C. Aggarwal,et al.  Outlier Detection for Temporal Data: A Survey , 2014, IEEE Transactions on Knowledge and Data Engineering.

[19]  Han Zou,et al.  Non-Parametric Outliers Detection in Multiple Time Series A Case Study: Power Grid Data Analysis , 2018, AAAI.

[20]  Yoshua Bengio,et al.  Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations , 2016, ICLR.

[21]  Peter Tiño,et al.  Learning long-term dependencies in NARX recurrent neural networks , 1996, IEEE Trans. Neural Networks.

[22]  Alaa El. Sagheer,et al.  Time series forecasting of petroleum production using deep LSTM recurrent networks , 2019, Neurocomputing.

[23]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[24]  Adriano Lorena Inácio de Oliveira,et al.  A sequential learning method with Kalman filter and extreme learning machine for regression and time series forecasting , 2019, Neurocomputing.

[25]  Chelsea Dobbins,et al.  Scalable Daily Human Behavioral Pattern Mining from Multivariate Temporal Data , 2016, IEEE Transactions on Knowledge and Data Engineering.

[26]  Zhiyuan Liu,et al.  Relation Classification via Multi-Level Attention CNNs , 2016, ACL.

[27]  Yu Zheng,et al.  GeoMAN: Multi-level Attention Networks for Geo-sensory Time Series Prediction , 2018, IJCAI.