论文信息 - A Disentangled Recognition and Nonlinear Dynamics Model for Unsupervised Learning - 字舞流文

A Disentangled Recognition and Nonlinear Dynamics Model for Unsupervised Learning

This paper takes a step towards temporal reasoning in a dynamically changing video, not in the pixel space that constitutes its frames, but in a latent space that describes the non-linear dynamics of the objects in its world. We introduce the Kalman variational auto-encoder, a framework for unsupervised learning of sequential data that disentangles two latent representations: an object's representation, coming from a recognition model, and a latent state describing its dynamics. As a result, the evolution of the world can be imagined and missing data imputed, both without the need to generate high dimensional frames at each time step. The model is trained end-to-end on videos of a variety of simulated physical systems, and outperforms competing methods in generative and missing data imputation tasks.

Marco Fraccaro | O. Winther | U. Paquet | Simon Kamronn

[1] Daan Wierstra,et al. Recurrent Environment Simulators , 2017, ICLR.

[2] Matthias W. Seeger,et al. Bayesian Intermittent Demand Forecasting for Large Inventories , 2016, NIPS.

[3] Razvan Pascanu,et al. Interaction Networks for Learning about Objects, Relations and Physics , 2016, NIPS.

[4] Joshua B. Tenenbaum,et al. A Compositional Object-Based Approach to Learning Physical Dynamics , 2016, ICLR.

[5] Christopher Burgess,et al. beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[6] Ben Poole,et al. Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[7] Yee Whye Teh,et al. The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[8] Uri Shalit,et al. Structured Inference Networks for Nonlinear State Space Models , 2016, AAAI.

[9] Daniel Rueckert,et al. Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] John P. Cunningham,et al. Linear dynamical neural population models through nonlinear embeddings , 2016, NIPS.

[11] Søren Kaae Sønderby,et al. Sequential Neural Models with Stochastic Layers , 2016, NIPS.

[12] Sergey Levine,et al. Unsupervised Learning for Physical Interaction through Video Prediction , 2016, NIPS.

[13] Sergey Levine,et al. Backprop KF: Learning Discriminative Deterministic State Estimators , 2016, NIPS.

[14] Maximilian Karl,et al. Deep Variational Bayes Filters: Unsupervised Learning of State Space Models from Raw Data , 2016, ICLR.

[15] Ryan P. Adams,et al. Composing graphical models with neural networks for structured representations and fast inference , 2016, NIPS.

[16] Byron Boots,et al. Learning to Filter with Predictive State Inference Machines , 2015, ICML.

[17] Jiajun Wu,et al. Galileo: Perceiving Physical Object Properties by Integrating a Physics Engine with Deep Learning , 2015, NIPS.

[18] Il Memming Park,et al. BLACK BOX VARIATIONAL INFERENCE FOR STATE SPACE MODELS , 2015, 1511.07367.

[19] Jitendra Malik,et al. Learning Visual Predictive Models of Physics for Playing Billiards , 2015, ICLR.

[20] Viorica Patraucean,et al. Spatio-temporal video autoencoder with differentiable memory , 2015, ArXiv.

[21] Honglak Lee,et al. Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[22] Martin A. Riedmiller,et al. Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[23] Yoshua Bengio,et al. A Recurrent Latent Variable Model for Sequential Data , 2015, NIPS.

[24] Nitish Srivastava,et al. Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.

[25] Thomas B. Schön,et al. From Pixels to Torques: Policy Learning with Deep Dynamical Models , 2015, ICML 2015.

[26] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[27] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[28] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[29] Zoubin Ghahramani,et al. A Unifying Review of Linear Gaussian Models , 1999, Neural Computation.

[30] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[31] Leslie G. Ungerleider,et al. ‘What’ and ‘where’ in the human brain , 1994, Current Opinion in Neurobiology.

[32] Scott W. Linderman,et al. Bayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems , 2017, AISTATS.

[33] P. Eveillard. [When and where?]. , 2014, La Revue du praticien.

[34] Kevin Murphy,et al. Switching Kalman Filters , 1998 .