Learning Visual Predictive Models of Physics for Playing Billiards

The ability to plan and execute goal specific actions in varied, unexpected settings is a central requirement of intelligent agents. In this paper, we explore how an agent can be equipped with an internal model of the dynamics of the external world, and how it can use this model to plan novel actions by running multiple internal simulations ("visual imagination"). Our models directly process raw visual input, and use a novel object-centric prediction formulation based on visual glimpses centered on objects (fixations) to enforce translational invariance of the learned physical laws. The agent gathers training data through random interaction with a collection of different environments, and the resulting model can then be used to plan goal-directed actions in novel environments that the agent has not seen before. We demonstrate that our agent can accurately plan actions for playing a simulated billiards game, which requires pushing a ball into a target position or into collision with another ball.

[1]  Michael I. Jordan,et al.  Forward Models: Supervised Learning with a Distal Teacher , 1992, Cogn. Sci..

[2]  Michael I. Jordan,et al.  An internal model for sensorimotor integration. , 1995, Science.

[3]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  C. Hofsten,et al.  Development of smooth pursuit tracking in young infants , 1997, Vision Research.

[5]  Mitsuo Kawato,et al.  MOSAIC Model for Sensorimotor Learning and Control , 2001, Neural Computation.

[6]  Nikolaus Hansen,et al.  Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.

[7]  Steven M. Seitz,et al.  Computing the Physical Parameters of Rigid-Body Motion from Video , 2002, ECCV.

[8]  Zoubin Ghahramani,et al.  Unsupervised learning of sensory-motor primitives , 2003, Proceedings of the 25th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (IEEE Cat. No.03CH37439).

[9]  J. L. Roux An Introduction to the Kalman Filter , 2003 .

[10]  Michael I. Jordan,et al.  Factorial Hidden Markov Models , 1995, Machine Learning.

[11]  Shiuh-Ku Weng,et al.  Video object tracking using adaptive Kalman filter , 2006, J. Vis. Commun. Image Represent..

[12]  David J. Fleet,et al.  3D People Tracking with Gaussian Process Dynamical Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  Geoffrey E. Hinton,et al.  The Recurrent Temporal Restricted Boltzmann Machine , 2008, NIPS.

[14]  Martin A. Riedmiller,et al.  The Neuro Slot Car Racer: Reinforcement Learning in a Real World Setting , 2009, 2009 International Conference on Machine Learning and Applications.

[15]  David J. Fleet,et al.  Estimating contact dynamics , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[16]  Jessica B. Hamrick Internal physics models guide probabilistic judgments about object dynamics , 2011 .

[17]  Raquel Urtasun,et al.  Physically-based motion models for 3D tracking: A convex formulation , 2011, 2011 International Conference on Computer Vision.

[18]  Antonis A. Argyros,et al.  Binding Computer Vision to Physics Based Simulation: The Case Study of a Bouncing Ball , 2011, British Machine Vision Conference.

[19]  Martin A. Riedmiller,et al.  Autonomous reinforcement learning on raw visual input data in a real world application , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[20]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[21]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[22]  Byron Boots,et al.  Learning predictive models of a depth camera & manipulator from raw execution traces , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[23]  David Q. Mayne,et al.  Model predictive control: Recent developments and future promise , 2014, Autom..

[24]  Roland Memisevic,et al.  Modeling Deep Temporal Dependencies with Recurrent "Grammar Cells" , 2014, NIPS.

[25]  Thomas B. Schön,et al.  From Pixels to Torques: Policy Learning with Deep Dynamical Models , 2015, ICML 2015.

[26]  Jiajun Wu,et al.  Galileo: Perceiving Physical Object Properties by Integrating a Physics Engine with Deep Learning , 2015, NIPS.

[27]  Martin A. Riedmiller,et al.  Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[28]  Honglak Lee,et al.  Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[29]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[30]  Ali Farhadi,et al.  Newtonian Image Understanding: Unfolding the Dynamics of Objects in Static Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Xinyun Chen Under Review as a Conference Paper at Iclr 2017 Delving into Transferable Adversarial Ex- Amples and Black-box Attacks , 2016 .

[32]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[33]  DarrellTrevor,et al.  End-to-end training of deep visuomotor policies , 2016 .