Learning to Act by Predicting the Future

We present an approach to sensorimotor control in immersive environments. Our approach utilizes a high-dimensional sensory stream and a lower-dimensional measurement stream. The cotemporal structure of these streams provides a rich supervisory signal, which enables training a sensorimotor control model by interacting with the environment. The model is trained using supervised learning techniques, but without extraneous supervision. It learns to act based on raw sensory input from a complex three-dimensional environment. The presented formulation enables learning without a fixed goal at training time, and pursuing dynamically changing goals at test time. We conduct extensive experiments in three-dimensional simulations based on the classical first-person game Doom. The results demonstrate that the presented approach outperforms sophisticated prior formulations, particularly on challenging tasks. The results also show that trained models successfully generalize across environments and goals. A model trained using the presented approach won the Full Deathmatch track of the Visual Doom AI Competition, which was held in previously unseen environments.

[1]  J. Javůrek [Motor development]. , 1988, Ceskoslovenska pediatrie.

[2]  Michael I. Jordan,et al.  Forward Models: Supervised Learning with a Distal Teacher , 1992, Cogn. Sci..

[3]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[4]  Dimitri P. Bertsekas,et al.  A Counterexample to Temporal Differences Learning , 1995, Neural Computation.

[5]  Richard S. Sutton,et al.  Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .

[6]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[7]  Richard S. Sutton,et al.  Reinforcement Learning with Replacing Eligibility Traces , 2005, Machine Learning.

[8]  Konkoly Thege Multi-criteria Reinforcement Learning , 1998 .

[9]  Csaba Szepesvári,et al.  A Unified Analysis of Value-Function-Based Reinforcement-Learning Algorithms , 1999, Neural Computation.

[10]  Richard S. Sutton,et al.  Predictive Representations of State , 2001, NIPS.

[11]  Yishay Mansour,et al.  Learning Rates for Q-learning , 2004, J. Mach. Learn. Res..

[12]  John N. Tsitsiklis,et al.  On the Convergence of Optimistic Policy Iteration , 2002, J. Mach. Learn. Res..

[13]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[14]  D. Kushner Masters of Doom: How Two Guys Created an Empire and Transformed Pop Culture , 2003 .

[15]  Peter Stone,et al.  Learning Predictive State Representations , 2003, ICML.

[16]  Richard S. Sutton,et al.  Reinforcement learning with replacing eligibility traces , 2004, Machine Learning.

[17]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[18]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[19]  Yann LeCun,et al.  Off-Road Obstacle Avoidance through End-to-End Learning , 2005, NIPS.

[20]  Dimitri P. Bertsekas,et al.  Pathologies of temporal difference methods in approximate dynamic programming , 2010, 49th IEEE Conference on Decision and Control (CDC).

[21]  Patrick M. Pilarski,et al.  Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.

[22]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[23]  Jan Peters,et al.  Nonamemanuscript No. (will be inserted by the editor) Reinforcement Learning to Adjust Parametrized Motor Primitives to , 2011 .

[24]  Andrew G. Barto,et al.  Transfer in Reinforcement Learning via Shared Features , 2012, J. Mach. Learn. Res..

[25]  Bruno Castro da Silva,et al.  Learning Parameterized Skills , 2012, ICML.

[26]  Martial Hebert,et al.  Learning monocular reactive UAV control in cluttered natural environments , 2012, 2013 IEEE International Conference on Robotics and Automation.

[27]  Sergey Levine,et al.  Guided Policy Search , 2013, ICML.

[28]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[29]  Shimon Whiteson,et al.  A Survey of Multi-Objective Sequential Decision-Making , 2013, J. Artif. Intell. Res..

[30]  Peter Englert,et al.  Multi-task policy search for robotics , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[31]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.

[32]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[33]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[34]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[35]  Honglak Lee,et al.  Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[36]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[37]  Samuel Gershman,et al.  Deep Successor Reinforcement Learning , 2016, ArXiv.

[38]  Yann LeCun,et al.  Deep multi-scale video prediction beyond mean square error , 2015, ICLR.

[39]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[40]  Honglak Lee,et al.  Control of Memory, Active Perception, and Action in Minecraft , 2016, ICML.

[41]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[42]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[43]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[44]  Xinyun Chen Under Review as a Conference Paper at Iclr 2017 Delving into Transferable Adversarial Ex- Amples and Black-box Attacks , 2016 .

[45]  Wojciech Jaskowski,et al.  ViZDoom: A Doom-based AI research platform for visual reinforcement learning , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[46]  Joel Z. Leibo,et al.  Model-Free Episodic Control , 2016, ArXiv.

[47]  Joshua B. Tenenbaum,et al.  Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[48]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[49]  Sergey Levine,et al.  Unsupervised Learning for Physical Interaction through Video Prediction , 2016, NIPS.

[50]  Alex Graves,et al.  Video Pixel Networks , 2016, ICML.

[51]  Guillaume Lample,et al.  Playing FPS Games with Deep Reinforcement Learning , 2016, AAAI.

[52]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[53]  Omer Levy,et al.  Published as a conference paper at ICLR 2018 S IMULATING A CTION D YNAMICS WITH N EURAL P ROCESS N ETWORKS , 2018 .