Reasoning About Physical Interactions with Object-Oriented Prediction and Planning

Object-based factorizations provide a useful level of abstraction for interacting with the world. Building explicit object representations, however, often requires supervisory signals that are difficult to obtain in practice. We present a paradigm for learning object-centric representations for physical scene understanding without direct supervision of object properties. Our model, Object-Oriented Prediction and Planning (O2P2), jointly learns a perception function to map from image observations to object representations, a pairwise physics interaction function to predict the time evolution of a collection of objects, and a rendering function to map objects back to pixels. For evaluation, we consider not only the accuracy of the physical predictions of the model, but also its utility for downstream tasks that require an actionable representation of intuitive physics. After training our model on an image prediction task, we can use its learned representations to build block towers more complicated than those observed during training.

[1]  Lawrence G. Roberts,et al.  Machine Perception of Three-Dimensional Solids , 1963, Outstanding Dissertations in the Computer Sciences.

[2]  Patrick Henry Winston,et al.  Learning structural descriptions from examples , 1970 .

[3]  Dirk P. Kroese,et al.  The Cross Entropy Method: A Unified Approach To Combinatorial Optimization, Monte-carlo Simulation (Information Science and Statistics) , 2004 .

[4]  Andre Cohen,et al.  An object-oriented representation for efficient reinforcement learning , 2008, ICML '08.

[5]  Alexei A. Efros,et al.  Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics , 2010, ECCV.

[6]  Jessica B. Hamrick Internal physics models guide probabilistic judgments about object dynamics , 2011 .

[7]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[8]  David Wingate,et al.  A Physics-Based Model Prior for Object-Oriented MDPs , 2014, ICML.

[9]  Kun Zhou,et al.  Imagining the unseen , 2014, ACM Trans. Graph..

[10]  Katsushi Ikeuchi,et al.  Scene Understanding by Reasoning Stability and Safety , 2015, International Journal of Computer Vision.

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Tsuhan Chen,et al.  3D Reasoning from Blocks to Stability , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[14]  Geoffrey E. Hinton,et al.  Attend, Infer, Repeat: Fast Scene Understanding with Generative Models , 2016, NIPS.

[15]  Jitendra Malik,et al.  Learning Visual Predictive Models of Physics for Playing Billiards , 2015, ICLR.

[16]  Ali Farhadi,et al.  Newtonian Image Understanding: Unfolding the Dynamics of Objects in Static Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[18]  Rob Fergus,et al.  Learning Physical Intuition of Block Towers by Example , 2016, ICML.

[19]  Razvan Pascanu,et al.  Visual Interaction Networks: Learning a Physics Simulator from Video , 2017, NIPS.

[20]  Jürgen Schmidhuber,et al.  Neural Expectation Maximization , 2017, NIPS.

[21]  Mario Fritz,et al.  Visual stability prediction for robotic manipulation , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[22]  Joshua B. Tenenbaum,et al.  A Compositional Object-Based Approach to Learning Physical Dynamics , 2016, ICLR.

[23]  Jiajun Wu,et al.  Neural Scene De-rendering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Jiajun Wu,et al.  Learning to See Physics via Visual De-animation , 2017, NIPS.

[25]  Sergey Levine,et al.  Deep Object-Centric Representations for Generalizable Robot Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[26]  Pascal Poupart,et al.  Unsupervised Video Object Segmentation for Deep Reinforcement Learning , 2018, NeurIPS.

[27]  Sergey Levine,et al.  Stochastic Variational Video Prediction , 2017, ICLR.

[28]  Jürgen Schmidhuber,et al.  Relational Neural Expectation Maximization: Unsupervised Discovery of Objects and their Interactions , 2018, ICLR.

[29]  Sergey Levine,et al.  Stochastic Adversarial Video Prediction , 2018, ArXiv.

[30]  Emma Brunskill,et al.  Strategic Object Oriented Reinforcement Learning , 2018, ArXiv.

[31]  Niloy J. Mitra,et al.  Taking Visual Motion Prediction To New Heightfields , 2019, Comput. Vis. Image Underst..