Physics-as-Inverse-Graphics: Joint Unsupervised Learning of Objects and Physics from Video

We aim to perform unsupervised discovery of objects and their states such as location and velocity, as well as physical system parameters such as mass and gravity from video -- given only the differential equations governing the scene dynamics. Existing physical scene understanding methods require either object state supervision, or do not integrate with differentiable physics to learn interpretable system parameters and states. We address this problem through a $\textit{physics-as-inverse-graphics}$ approach that brings together vision-as-inverse-graphics and differentiable physics engines. This framework allows us to perform long term extrapolative video prediction, as well as vision-based model-predictive control. Our approach significantly outperforms related unsupervised methods in long-term future frame prediction of systems with interacting objects (such as ball-spring or 3-body gravitational systems). We further show the value of this tight vision-physics integration by demonstrating data-efficient learning of vision-actuated model-based control for a pendulum system. The controller's interpretability also provides unique capabilities in goal-driven control and physical reasoning for zero-data adaptation.

[1]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[2]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[3]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[4]  Stefano Ermon,et al.  Label-Free Supervision of Neural Networks with Physics and Domain Knowledge , 2016, AAAI.

[5]  Kevin Murphy,et al.  Efficient inference in occlusion-aware generative models of images , 2015, ArXiv.

[6]  Niloy J. Mitra,et al.  Unsupervised Intuitive Physics from Visual Observations , 2018, ACCV.

[7]  Razvan Pascanu,et al.  Visual Interaction Networks: Learning a Physics Simulator from Video , 2017, NIPS.

[8]  Chongjie Zhang,et al.  Object-Oriented Dynamics Predictor , 2018, NeurIPS.

[9]  Jonas Degrave,et al.  A DIFFERENTIABLE PHYSICS ENGINE FOR DEEP LEARNING IN ROBOTICS , 2016, Front. Neurorobot..

[10]  Jitendra Malik,et al.  Learning Visual Predictive Models of Physics for Playing Billiards , 2015, ICLR.

[11]  Jiajun Wu,et al.  Unsupervised Learning of Latent Physical Properties Using Perception-Prediction Networks , 2018, UAI.

[12]  Matthew W. Hoffman,et al.  Distributed Distributional Deterministic Policy Gradients , 2018, ICLR.

[13]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[14]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[15]  Sergey Levine,et al.  Reasoning About Physical Interactions with Object-Oriented Prediction and Planning , 2018, ICLR.

[16]  Jürgen Schmidhuber,et al.  Relational Neural Expectation Maximization: Unsupervised Discovery of Objects and their Interactions , 2018, ICLR.

[17]  Sergey Levine,et al.  Unsupervised Learning for Physical Interaction through Video Prediction , 2016, NIPS.

[18]  Joshua B. Tenenbaum,et al.  Deep Convolutional Inverse Graphics Network , 2015, NIPS.

[19]  Franziska Meier,et al.  SE3-Pose-Nets: Structured Deep Dynamics Models for Visuomotor Planning and Control , 2017, ArXiv.

[20]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[21]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[22]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[23]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[24]  Daan Wierstra,et al.  One-Shot Generalization in Deep Generative Models , 2016, ICML.

[25]  Reuven Y. Rubinstein,et al.  Optimization of computer simulation models with rare events , 1997 .

[26]  Joshua B. Tenenbaum,et al.  A Compositional Object-Based Approach to Learning Physical Dynamics , 2016, ICLR.

[27]  Juan Carlos Niebles,et al.  Learning to Decompose and Disentangle Representations for Video Prediction , 2018, NeurIPS.

[28]  Jiajun Wu,et al.  Learning to See Physics via Visual De-animation , 2017, NIPS.

[29]  Ruben Villegas,et al.  Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[30]  Benjamin Recht,et al.  Simple random search provides a competitive approach to reinforcement learning , 2018, ArXiv.

[31]  Yee Whye Teh,et al.  Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects , 2018, NeurIPS.

[32]  Geoffrey E. Hinton,et al.  Attend, Infer, Repeat: Fast Scene Understanding with Generative Models , 2016, NIPS.

[33]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[34]  Pushmeet Kohli,et al.  Vision-as-Inverse-Graphics: Obtaining a Rich 3D Explanation of a Scene from a Single Image , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[35]  Jiajun Wu,et al.  Neural Scene De-rendering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[37]  Geoffrey E. Hinton,et al.  Transforming Auto-Encoders , 2011, ICANN.

[38]  Ole Winther,et al.  A Disentangled Recognition and Nonlinear Dynamics Model for Unsupervised Learning , 2017, NIPS.

[39]  Joshua B. Tenenbaum,et al.  End-to-End Differentiable Physics for Learning and Control , 2018, NeurIPS.

[40]  Subramanian Ramamoorthy,et al.  Learning Programmatically Structured Representations with Perceptor Gradients , 2019, ICLR.

[41]  Chen Sun,et al.  Unsupervised Discovery of Parts, Structure, and Dynamics , 2019, ICLR.

[42]  Michael Burke,et al.  From explanation to synthesis: Compositional program induction for learning from demonstration , 2019, Robotics: Science and Systems.

[43]  David Duvenaud,et al.  Neural Ordinary Differential Equations , 2018, NeurIPS.

[44]  Armando Solar-Lezama,et al.  Learning to Infer Graphics Programs from Hand-Drawn Images , 2017, NeurIPS.

[45]  Nitish Srivastava,et al.  Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.