Reduced-Gate Convolutional LSTM Design Using Predictive Coding for Next-Frame Video Prediction

Spatiotemporal sequence prediction is an important problem in deep learning. We study next-frame video prediction using a deep-learning-based predictive coding framework that uses convolutional, long short-term memory (convLSTM) modules. We introduce a novel reduced-gate convolutional LSTM architecture. Our reduced-gate model achieves better next-frame prediction accuracy than the original convolutional LSTM while using a smaller parameter budget, thereby reducing training time. We tested our reduced gate modules within a predictive coding architecture on the moving MNIST and KITTI datasets. We found that our reducedgate model has a significant reduction of approximately 40 percent of the total number of training parameters and training time in comparison with the standard LSTM model which makes it attractive for hardware implementation especially on small devices.

[1]  Karl J. Friston,et al.  Canonical Microcircuits for Predictive Coding , 2012, Neuron.

[2]  Jürgen Schmidhuber,et al.  Learning to forget: continual prediction with LSTM , 1999 .

[3]  Philip S. Yu,et al.  PredRNN++: Towards A Resolution of the Deep-in-Time Dilemma in Spatiotemporal Predictive Learning , 2018, ICML.

[4]  Gabriel Kreiman,et al.  Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning , 2016, ICLR.

[5]  Sergey Levine,et al.  Unsupervised Learning for Physical Interaction through Video Prediction , 2016, NIPS.

[6]  C. Ballantine On the Hadamard product , 1968 .

[7]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[8]  Dit-Yan Yeung,et al.  Deep Learning for Precipitation Nowcasting: A Benchmark and A New Model , 2017, NIPS.

[9]  A. Borst Seeing smells: imaging olfactory learning in bees , 1999, Nature Neuroscience.

[10]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[12]  Alex Graves,et al.  Video Pixel Networks , 2016, ICML.

[13]  Philip S. Yu,et al.  PredRNN: Recurrent Neural Networks for Predictive Learning using Spatiotemporal LSTMs , 2017, NIPS.

[14]  Magdy A. Bayoumi,et al.  Empirical Activation Function Effects on Unsupervised Convolutional LSTM Learning , 2018, 2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI).

[15]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[16]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[17]  Jianxin Wu,et al.  Minimal gated unit for recurrent neural networks , 2016, International Journal of Automation and Computing.

[18]  Misha Denil,et al.  Noisy Activation Functions , 2016, ICML.

[19]  Nitish Srivastava,et al.  Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.

[20]  Jürgen Schmidhuber,et al.  Learning Precise Timing with LSTM Recurrent Networks , 2003, J. Mach. Learn. Res..

[21]  Karl J. Friston,et al.  A theory of cortical responses , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[22]  Dit-Yan Yeung,et al.  Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[23]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[24]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[25]  Luc Van Gool,et al.  Dynamic Filter Networks , 2016, NIPS.

[26]  Fathi M. Salem,et al.  Simplified minimal gated unit variations for recurrent neural networks , 2017, 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS).