Transformation-gated LSTM: efficient capture of short-term mutation dependencies for multivariate time series prediction tasks

Most multivariate time series data have very complex long-term and short-term dependencies that change over time. Currently, some recurrent neural network (RNN) variants for sequence tasks enhance the learning ability of long-term dependence on time series data. However, there lack of RNN network for capturing short-term mutation information for multivariate time series. In the present work, we proposed a transformation-gated LSTM (TG-LSTM) to enhance the ability of capturing short-term mutation information. First, the transformation gate introduced a hyperbolic tangent function to the memory cell state of the previous time step and the input gate information of the current time step without losing the memory cell state information. Then, the function value range of the partial derivative corresponding to the transformation gate during the backpropagation fully reflected the gradient change, thereby obtaining a better error gradient flow. We further extended to multi-layer TG-LSTM network and compared its stability and robustness with all baseline models. The multi-layer TG-LSTM network was superior to all baseline models in terms of prediction accuracy and performance stability on two different multivariate time series tasks.

[1]  Christopher Kermorvant,et al.  Dropout Improves Recurrent Neural Networks for Handwriting Recognition , 2013, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[2]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[3]  Fenglong Ma,et al.  Dipole: Diagnosis Prediction in Healthcare via Attention-based Bidirectional Recurrent Neural Networks , 2017, KDD.

[4]  Shih-Chii Liu,et al.  Gaussian-gated LSTM: Improved convergence by reducing state updates , 2018 .

[5]  Peter Szolovits,et al.  A Multivariate Timeseries Modeling Approach to Severity of Illness Assessment and Forecasting in ICU with Sparse, Heterogeneous Clinical Data , 2015, AAAI.

[6]  Shuai Li,et al.  Independently Recurrent Neural Network (IndRNN): Building A Longer and Deeper RNN , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Yoshua Bengio,et al.  Attention-Based Models for Speech Recognition , 2015, NIPS.

[8]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.

[9]  Yoshua Bengio,et al.  Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations , 2016, ICLR.

[10]  Wojciech Zaremba,et al.  An Empirical Exploration of Recurrent Network Architectures , 2015, ICML.

[11]  Garrison W. Cottrell,et al.  A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction , 2017, IJCAI.

[12]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[13]  Aaron C. Courville,et al.  Recurrent Batch Normalization , 2016, ICLR.

[14]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[15]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[16]  Jordi Torres,et al.  Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks , 2017, ICLR.

[17]  Xuan Liang,et al.  Assessing Beijing's PM2.5 pollution: severity, weather impact, APEC and winter heating , 2015, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[18]  Zoubin Ghahramani,et al.  A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.

[19]  Quoc V. Le,et al.  Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Quoc V. Le,et al.  Learning to Skim Text , 2017, ACL.

[21]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[22]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[23]  Yu Zheng,et al.  GeoMAN: Multi-level Attention Networks for Geo-sensory Time Series Prediction , 2018, IJCAI.

[24]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[25]  Fenglong Ma,et al.  MuVAN: A Multi-view Attention Network for Multivariate Temporal Data , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[26]  Inchul Song,et al.  RNNDROP: A novel dropout for RNNS in ASR , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[27]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.