Unsupervised separation of dynamics from pixels

We present an approach to learn the dynamics of multiple objects from image sequences in an unsupervised way. We introduce a probabilistic model that first generate noisy positions for each object through a separate linear state-space model, and then renders the positions of all objects in the same image through a highly non-linear process. Such a linear representation of the dynamics enables us to propose an inference method that uses exact and efficient inference tools and that can be deployed to query the model in different ways without retraining.

[1]  Daan Wierstra,et al.  Recurrent Environment Simulators , 2017, ICLR.

[2]  Sergey Levine,et al.  Stochastic Variational Video Prediction , 2017, ICLR.

[3]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[4]  Yaakov Bar-Shalom,et al.  Estimation and Tracking: Principles, Techniques, and Software , 1993 .

[5]  Silvia Chiappa,et al.  Analysis and Classification of EEG Signals using Probabilistic Models for Brain Computer Interfaces , 2006 .

[6]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[7]  Silvia Chiappa A Bayesian Approach to Switching Linear Gaussian State-Space Models for Unsupervised Time-Series Segmentation , 2008, 2008 Seventh International Conference on Machine Learning and Applications.

[8]  Byron Boots,et al.  Learning to Filter with Predictive State Inference Machines , 2015, ICML.

[9]  Sergey Levine,et al.  Unsupervised Learning for Physical Interaction through Video Prediction , 2016, NIPS.

[10]  Nitish Srivastava,et al.  Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.

[11]  John P. Cunningham,et al.  Linear dynamical neural population models through nonlinear embeddings , 2016, NIPS.

[12]  Silvia Chiappa,et al.  Explicit-Duration Markov Switching Models , 2014, Found. Trends Mach. Learn..

[13]  Vighnesh Birodkar,et al.  Unsupervised Learning of Disentangled Representations from Video , 2017, NIPS.

[14]  Mohammad Emtiyaz Khan,et al.  Variational Message Passing with Structured Inference Networks , 2018, ICLR.

[15]  Razvan Pascanu,et al.  Visual Interaction Networks: Learning a Physics Simulator from Video , 2017, NIPS.

[16]  Ryan P. Adams,et al.  Composing graphical models with neural networks for structured representations and fast inference , 2016, NIPS.

[17]  Anna Freud,et al.  Design And Analysis Of Modern Tracking Systems , 2016 .

[18]  D. Barber,et al.  Bayesian Time Series Models: Inference and estimation in probabilistic time series models , 2011 .

[19]  Ole Winther,et al.  A Disentangled Recognition and Nonlinear Dynamics Model for Unsupervised Learning , 2017, NIPS.

[20]  Ole Winther,et al.  Sequential Neural Models with Stochastic Layers , 2016, NIPS.

[21]  Honglak Lee,et al.  Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[22]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[23]  Uri Shalit,et al.  Structured Inference Networks for Nonlinear State Space Models , 2016, AAAI.