Reduced-Gate Convolutional LSTM Using Predictive Coding for Spatiotemporal Prediction

Spatiotemporal sequence prediction is an important problem in deep learning. We study next-frame(s) video prediction using a deep-learning-based predictive coding framework that uses convolutional, long short-term memory (convLSTM) modules. We introduce a novel reduced-gate convolutional LSTM(rgcLSTM) architecture that requires a significantly lower parameter budget than a comparable convLSTM. By using a single multi-function gate, our reduced-gate model achieves equal or better next-frame(s) prediction accuracy than the original convolutional LSTM while using a smaller parameter budget, thereby reducing training time and memory requirements. We tested our reduced gate modules within a predictive coding architecture on the moving MNIST and KITTI datasets. We found that our reduced-gate model has a significant reduction of approximately 40 percent of the total number of training parameters and a 25 percent reduction in elapsed training time in comparison with the standard convolutional LSTM model. The performance accuracy of the new model was also improved. This makes our model more attractive for hardware implementation, especially on small devices. We also explored a space of twenty different gated architectures to get insight into how our rgcLSTM fit into that space.

[1]  Dit-Yan Yeung,et al.  Deep Learning for Precipitation Nowcasting: A Benchmark and A New Model , 2017, NIPS.

[2]  Jürgen Schmidhuber,et al.  Learning Precise Timing with LSTM Recurrent Networks , 2003, J. Mach. Learn. Res..

[3]  Karl J. Friston,et al.  A theory of cortical responses , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[4]  Alex Graves,et al.  Video Pixel Networks , 2016, ICML.

[5]  François Chollet,et al.  Keras: The Python Deep Learning library , 2018 .

[6]  Dit-Yan Yeung,et al.  Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[7]  Wojciech Zaremba,et al.  An Empirical Exploration of Recurrent Network Architectures , 2015, ICML.

[8]  Nitish Srivastava,et al.  Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.

[9]  Philip S. Yu,et al.  PredRNN++: Towards A Resolution of the Deep-in-Time Dilemma in Spatiotemporal Predictive Learning , 2018, ICML.

[10]  Karl J. Friston,et al.  Canonical Microcircuits for Predictive Coding , 2012, Neuron.

[11]  Gabriel Kreiman,et al.  Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning , 2016, ICLR.

[12]  Magdy A. Bayoumi,et al.  Empirical Activation Function Effects on Unsupervised Convolutional LSTM Learning , 2018, 2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI).

[13]  Jianxin Wu,et al.  Minimal gated unit for recurrent neural networks , 2016, International Journal of Automation and Computing.

[14]  Sergey Levine,et al.  Unsupervised Learning for Physical Interaction through Video Prediction , 2016, NIPS.

[15]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[16]  Magdy A. Bayoumi,et al.  Deep Gated Recurrent and Convolutional Network Hybrid Model for Univariate Time Series Classification , 2018, International Journal of Advanced Computer Science and Applications.

[17]  Luc Van Gool,et al.  Dynamic Filter Networks , 2016, NIPS.

[18]  Magdy Bayoumi,et al.  Reduced-Gate Convolutional LSTM Architecture for Next-Frame Video Prediction Using Predictive Coding , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).

[19]  Fathi M. Salem,et al.  Simplified minimal gated unit variations for recurrent neural networks , 2017, 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS).

[20]  Hoon Sohn,et al.  Damage diagnosis using time series analysis of vibration signals , 2001 .

[21]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[22]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[24]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[25]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[26]  Philip S. Yu,et al.  PredRNN: Recurrent Neural Networks for Predictive Learning using Spatiotemporal LSTMs , 2017, NIPS.

[27]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[28]  Misha Denil,et al.  Noisy Activation Functions , 2016, ICML.

[29]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[30]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[31]  Rajesh P. N. Rao,et al.  Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. , 1999 .

[32]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[33]  Charles R. Johnson,et al.  Topics in matrix analysis: The Hadamard product , 1991 .

[34]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[35]  C. Ballantine On the Hadamard product , 1968 .

[36]  Magdy Bayoumi,et al.  Effects of Different Activation Functions for Unsupervised Convolutional LSTM Spatiotemporal Learning , 2019, Advances in Science, Technology and Engineering Systems Journal.

[37]  J. Rotton,et al.  Air pollution, weather, and violent crimes: concomitant time-series analysis of archival data. , 1985, Journal of personality and social psychology.