Generative Temporal Models with Memory

We consider the general problem of modeling temporal data with long-range dependencies, wherein new observations are fully or partially predictable based on temporally-distant, past observations. A sufficiently powerful temporal model should separate predictable elements of the sequence from unpredictable elements, express uncertainty about those unpredictable elements, and rapidly identify novel elements that may help to predict the future. To create such models, we introduce Generative Temporal Models augmented with external memory systems. They are developed within the variational inference framework, which provides both a practical training methodology and methods to gain insight into the models' operation. We show, on a range of problems with sparse, long-term temporal dependencies, that these models store information from early in a sequence, and reuse this stored information efficiently. This allows them to perform substantially better than existing models based on well-known recurrent neural networks, like LSTMs.

[1]  Richard S. Sutton,et al.  Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[2]  Sergey Levine,et al.  Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.

[3]  Geoffrey E. Hinton,et al.  Parameter estimation for linear dynamical systems , 1996 .

[4]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[5]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[6]  Jason Weston,et al.  Memory Networks , 2014, ICLR.

[7]  A. U.S.,et al.  Predictability , Complexity , and Learning , 2002 .

[8]  Thomas B. Schön,et al.  Data-Efficient Learning of Feedback Policies from Image Pixels using Deep Dynamical Models , 2015, ArXiv.

[9]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[10]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[11]  Ole Winther,et al.  Sequential Neural Models with Stochastic Layers , 2016, NIPS.

[12]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[13]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[14]  Uri Shalit,et al.  Deep Kalman Filters , 2015, ArXiv.

[15]  Barak A. Pearlmutter Gradient calculations for dynamic recurrent neural networks: a survey , 1995, IEEE Trans. Neural Networks.

[16]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Nando de Freitas,et al.  Neural Programmer-Interpreters , 2015, ICLR.

[18]  Rudolf Kadlec,et al.  Text Understanding with the Attention Sum Reader Network , 2016, ACL.

[19]  Tim Rocktäschel,et al.  Programming with a Differentiable Forth Interpreter , 2016, ICML.

[20]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[21]  Dustin Tran,et al.  Hierarchical Variational Models , 2015, ICML.

[22]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[23]  Yoshua Bengio,et al.  A Recurrent Latent Variable Model for Sequential Data , 2015, NIPS.

[24]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[25]  Bo Zhang,et al.  Learning to Generate with Memory , 2016, ICML.

[26]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[27]  Juha Karhunen,et al.  Time Series Prediction with Variational Bayesian Nonlinear State-Space Models , 2007 .

[28]  Geoffrey E. Hinton,et al.  Attend, Infer, Repeat: Fast Scene Understanding with Generative Models , 2016, NIPS.

[29]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[30]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[31]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[32]  Christian Osendorfer,et al.  Learning Stochastic Recurrent Networks , 2014, NIPS 2014.

[33]  Phil Blunsom,et al.  Learning to Transduce with Unbounded Memory , 2015, NIPS.

[34]  Martin A. Riedmiller,et al.  Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[35]  Daan Wierstra,et al.  One-shot Learning with Memory-Augmented Neural Networks , 2016, ArXiv.

[36]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[37]  Yoshua Bengio,et al.  Gated Feedback Recurrent Neural Networks , 2015, ICML.

[38]  Richard Socher,et al.  Ask Me Anything: Dynamic Memory Networks for Natural Language Processing , 2015, ICML.

[39]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[40]  Tomas Mikolov,et al.  Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets , 2015, NIPS.

[41]  M. Fu Stochastic Gradient Estimation , 2005 .

[42]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.