A Compositional Object-Based Approach to Learning Physical Dynamics

We present the Neural Physics Engine (NPE), a framework for learning simulators of intuitive physics that naturally generalize across variable object count and different scene configurations. We propose a factorization of a physical scene into composable object-based representations and a neural network architecture whose compositional structure factorizes object dynamics into pairwise interactions. Like a symbolic physics engine, the NPE is endowed with generic notions of objects and their interactions; realized as a neural network, it can be trained via stochastic gradient descent to adapt to specific object properties and dynamics of different worlds. We evaluate the efficacy of our approach on simple rigid body dynamics in two-dimensional worlds. By comparing to less structured architectures, we show that the NPE's compositional representation of the structure in physical interactions improves its ability to predict movement, generalize across variable object count and different scene configurations, and infer latent properties of objects such as mass.

[1]  John R. Anderson Cognitive Psychology and Its Implications , 1980 .

[2]  Gerald J. Sussman,et al.  Structure and interpretation of computer programs , 1985, Proceedings of the IEEE.

[3]  Elizabeth S. Spelke,et al.  Principles of Object Perception , 1990, Cogn. Sci..

[4]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[5]  Matthew Brand,et al.  Physics-Based Visual Understanding , 1997, Comput. Vis. Image Underst..

[6]  Geoffrey E. Hinton,et al.  NeuroAnimator: fast neural network emulation and control of physics-based models , 1998, SIGGRAPH.

[7]  Z. Pylyshyn,et al.  Dynamics of target selection in multiple object tracking (MOT). , 2006, Spatial vision.

[8]  Geoffrey E. Hinton,et al.  The Recurrent Temporal Restricted Boltzmann Machine , 2008, NIPS.

[9]  Armando Solar-Lezama,et al.  Program synthesis by sketching , 2008 .

[10]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[11]  Jessica B. Hamrick Internal physics models guide probabilistic judgments about object dynamics , 2011 .

[12]  Geoffrey E. Hinton,et al.  Transforming Auto-Encoders , 2011, ICANN.

[13]  Charles Kemp,et al.  How to Grow a Mind: Statistics, Structure, and Abstraction , 2011, Science.

[14]  Clément Farabet,et al.  Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.

[15]  Jeffrey Pennington,et al.  Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection , 2011, NIPS.

[16]  Joshua B. Tenenbaum,et al.  Noisy Newtons: Unifying process and dependency accounts of causal attribution , 2012, CogSci.

[17]  Kevin A. Smith,et al.  Sources of uncertainty in intuitive physics , 2012, CogSci.

[18]  Jessica B. Hamrick,et al.  Simulation as an engine of physical scene understanding , 2013, Proceedings of the National Academy of Sciences.

[19]  Noah D. Goodman,et al.  Learning physics from dynamical scenes , 2014 .

[20]  Joshua B. Tenenbaum,et al.  Inverse Graphics with Probabilistic CAD Models , 2014, ArXiv.

[21]  Yang Wang,et al.  rnn : Recurrent Library for Torch , 2015, ArXiv.

[22]  Joshua B. Tenenbaum,et al.  Humans predict liquid dynamics using probabilistic simulation , 2015, CogSci.

[23]  Joshua B. Tenenbaum,et al.  Deep Convolutional Inverse Graphics Network , 2015, NIPS.

[24]  Jiajun Wu,et al.  Galileo: Perceiving Physical Object Properties by Integrating a Physics Engine with Deep Learning , 2015, NIPS.

[25]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[26]  Nitish Srivastava,et al.  Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.

[27]  Armando Solar-Lezama,et al.  Unsupervised Learning by Program Synthesis , 2015, NIPS.

[28]  Joshua B. Tenenbaum,et al.  Picture: A probabilistic programming language for scene perception , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[30]  Jiajun Wu,et al.  Physics 101: Learning Physical Object Properties from Unlabeled Videos , 2016, BMVC.

[31]  Joshua B. Tenenbaum,et al.  Understanding Visual Concepts with Continuation Learning , 2016, ArXiv.

[32]  Geoffrey E. Hinton,et al.  Attend, Infer, Repeat: Fast Scene Understanding with Generative Models , 2016, NIPS.

[33]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[34]  Jitendra Malik,et al.  Learning Visual Predictive Models of Physics for Playing Billiards , 2015, ICLR.

[35]  Silvio Savarese,et al.  Structural-RNN: Deep Learning on Spatio-Temporal Graphs , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Richard S. Zemel,et al.  Gated Graph Sequence Neural Networks , 2015, ICLR.

[37]  Ali Farhadi,et al.  Newtonian Image Understanding: Unfolding the Dynamics of Objects in Static Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[39]  Dan Klein,et al.  Learning to Compose Neural Networks for Question Answering , 2016, NAACL.

[40]  Jitendra Malik,et al.  Learning to Poke by Poking: Experiential Learning of Intuitive Physics , 2016, NIPS.

[41]  Xinyun Chen Under Review as a Conference Paper at Iclr 2017 Delving into Transferable Adversarial Ex- Amples and Black-box Attacks , 2016 .

[42]  Mario Fritz,et al.  To Fall Or Not To Fall: A Visual Approach to Physical Stability Prediction , 2016, ArXiv.

[43]  Razvan Pascanu,et al.  Interaction Networks for Learning about Objects, Relations and Physics , 2016, NIPS.

[44]  Pushmeet Kohli,et al.  TerpreT: A Probabilistic Programming Language for Program Induction , 2016, ArXiv.

[45]  Nando de Freitas,et al.  Neural Programmer-Interpreters , 2015, ICLR.

[46]  Rob Fergus,et al.  Learning Physical Intuition of Block Towers by Example , 2016, ICML.

[47]  Silvio Savarese,et al.  Social LSTM: Human Trajectory Prediction in Crowded Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Ali Farhadi,et al.  "What Happens If..." Learning to Predict the Effect of Forces in Images , 2016, ECCV.

[49]  Sergey Levine,et al.  Unsupervised Learning for Physical Interaction through Video Prediction , 2016, NIPS.

[50]  Omer Levy,et al.  Published as a conference paper at ICLR 2018 S IMULATING A CTION D YNAMICS WITH N EURAL P ROCESS N ETWORKS , 2018 .