Variational Temporal Abstraction

We introduce a variational approach to learning and inference of temporally hierarchical structure and representation for sequential data. We propose the Variational Temporal Abstraction (VTA), a hierarchical recurrent state space model that can infer the latent temporal structure and thus perform the stochastic state transition hierarchically. We also propose to apply this model to implement the jumpy imagination ability in imagination-augmented agent-learning in order to improve the efficiency of the imagination. In experiments, we demonstrate that our proposed method can model 2D and 3D visual sequence datasets with interpretable temporal structure discovery and that its application to jumpy imagination enables more efficient agent-learning in a 3D navigation task.

[1]  Geoffrey E. Hinton,et al.  Variational Learning for Switching State-Space Models , 2000, Neural Computation.

[2]  R. Buckner The role of the hippocampus in prediction and imagination. , 2010, Annual review of psychology.

[3]  Shunzheng Yu,et al.  Hidden semi-Markov models , 2010, Artif. Intell..

[4]  E. Maguire,et al.  Memory , Imagination , and Predicting the Future : A Common Brain Mechanism ? , 2013 .

[5]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[6]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[7]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[8]  Yoshua Bengio,et al.  A Recurrent Latent Variable Model for Sequential Data , 2015, NIPS.

[9]  J. Prentice,et al.  A principle of economy predicts the functional architecture of grid cells , 2013, eLife.

[10]  Scott W. Linderman,et al.  Recurrent switching linear dynamical systems , 2016, 1610.08466.

[11]  Mark A. Lewis,et al.  State-space models’ dirty little secrets: even simple linear Gaussian models can have estimation problems , 2015, Scientific Reports.

[12]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[13]  Ole Winther,et al.  A Disentangled Recognition and Nonlinear Dynamics Model for Unsupervised Learning , 2017, NIPS.

[14]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[15]  Uri Shalit,et al.  Structured Inference Networks for Nonlinear State Space Models , 2016, AAAI.

[16]  Alexander J. Smola,et al.  State Space LSTM Models with Particle MCMC Inference , 2017, ArXiv.

[17]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[18]  Joelle Pineau,et al.  A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues , 2016, AAAI.

[19]  Yoshua Bengio,et al.  Hierarchical Multiscale Recurrent Neural Networks , 2016, ICLR.

[20]  Le Song,et al.  Recurrent Hidden Semi-Markov Model , 2017, ICLR.

[21]  Sashank J. Reddi,et al.  On the Convergence of Adam and Beyond , 2018, ICLR.

[22]  Prafulla Dhariwal,et al.  Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.

[23]  Pushmeet Kohli,et al.  Compositional Imitation Learning: Explaining and executing one task at a time , 2018, ArXiv.

[24]  Stefan Bauer,et al.  Adaptive Skip Intervals: Temporal Abstraction for Recurrent Dynamical Models , 2018, NeurIPS.

[25]  Fabio Viola,et al.  Learning and Querying Fast Generative Models for Reinforcement Learning , 2018, ArXiv.

[26]  Joseph J. Lim,et al.  KeyIn: Discovering Subgoal Structure with Keyframe-based Video Prediction , 2019, ArXiv.

[27]  Jessica B. Hamrick,et al.  Analogues of mental simulation and imagination in deep learning , 2019, Current Opinion in Behavioral Sciences.

[28]  Alexei A. Efros,et al.  Time-Agnostic Prediction: Predicting Predictable Video Frames , 2018, ICLR.

[29]  Karol Gregor,et al.  Temporal Difference Variational Auto-Encoder , 2018, ICLR.

[30]  Ruben Villegas,et al.  Learning Latent Dynamics for Planning from Pixels , 2018, ICML.