Learning with AMIGo: Adversarially Motivated Intrinsic Goals

A key challenge for reinforcement learning (RL) consists of learning in environments with sparse extrinsic rewards. In contrast to current RL methods, humans are able to learn new skills with little or no reward by using various forms of intrinsic motivation. We propose AMIGo, a novel agent incorporating a goal-generating teacher that proposes Adversarially Motivated Intrinsic Goals to train a goal-conditioned "student" policy in the absence of (or alongside) environment reward. Specifically, through a simple but effective "constructively adversarial" objective, the teacher learns to propose increasingly challenging---yet achievable---goals that allow the student to learn general skills for acting in a new environment, independent of the task to be solved. We show that our method generates a natural curriculum of self-proposed goals which ultimately allows the agent to solve challenging procedurally-generated tasks where other forms of intrinsic motivation and state-of-the-art RL methods fail.

[1]  Tom Schaul,et al.  Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[2]  Shiguang Shan,et al.  Self-Paced Curriculum Learning , 2015, AAAI.

[3]  Pieter Abbeel,et al.  Automatic Goal Generation for Reinforcement Learning Agents , 2017, ICML.

[4]  Julian Togelius,et al.  Rotation, Translation, and Cropping for Zero-Shot Generalization , 2020, 2020 IEEE Conference on Games (CoG).

[5]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[6]  Wolfram Burgard,et al.  Scheduled Intrinsic Drive: A Hierarchical Take on Intrinsically Motivated Exploration , 2019, ArXiv.

[7]  Marlos C. Machado,et al.  Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..

[8]  Jürgen Schmidhuber,et al.  PowerPlay: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem , 2011, Front. Psychol..

[9]  Abhinav Gupta,et al.  Hierarchical RL Using an Ensemble of Proprioceptive Periodic Policies , 2019, ICLR.

[10]  Sergey Levine,et al.  Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models , 2015, ArXiv.

[11]  Jürgen Schmidhuber,et al.  A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .

[12]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[13]  Pieter Abbeel,et al.  Automatic Curriculum Learning through Value Disagreement , 2020, NeurIPS.

[14]  Alexei A. Efros,et al.  Large-Scale Study of Curiosity-Driven Learning , 2018, ICLR.

[15]  Edward Grefenstette,et al.  The NetHack Learning Environment , 2020, NeurIPS.

[16]  Jan Peters,et al.  Self-Paced Deep Reinforcement Learning , 2020, NeurIPS.

[17]  Michael L. Littman,et al.  An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..

[18]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[19]  John Schulman,et al.  Leveraging Procedural Generation to Benchmark Reinforcement Learning , 2019, ICML.

[20]  Pierre-Yves Oudeyer,et al.  Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[21]  Allan Jabri,et al.  Unsupervised Curricula for Visual Meta-Reinforcement Learning , 2019, NeurIPS.

[22]  John Foley,et al.  ToyBox: Better Atari Environments for Testing Reinforcement Learning Agents , 2018, ArXiv.

[23]  Kevin Waugh,et al.  DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker , 2017, ArXiv.

[24]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[25]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[26]  Abhinav Gupta,et al.  Robust Adversarial Reinforcement Learning , 2017, ICML.

[27]  Marc G. Bellemare,et al.  Count-Based Exploration with Neural Density Models , 2017, ICML.

[28]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[29]  Joelle Pineau,et al.  A Dissection of Overfitting and Generalization in Continuous Reinforcement Learning , 2018, ArXiv.

[30]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[31]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[32]  Sham M. Kakade,et al.  Towards Generalization and Simplicity in Continuous Control , 2017, NIPS.

[33]  Ilya Kostrikov,et al.  Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play , 2017, ICLR.

[34]  Jan Peters,et al.  Self-Paced Contextual Reinforcement Learning , 2019, CoRL.

[35]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[36]  Sergey Levine,et al.  Skew-Fit: State-Covering Self-Supervised Reinforcement Learning , 2019, ICML.

[37]  Pierre-Yves Oudeyer,et al.  What is Intrinsic Motivation? A Typology of Computational Approaches , 2007, Frontiers Neurorobotics.

[38]  Andrew K. Lampinen,et al.  Automated curricula through setter-solver interactions , 2019, ArXiv.

[39]  Edward Grefenstette,et al.  RTFM: Generalising to Novel Environment Dynamics via Reading , 2020, ICLR.

[40]  Amos J. Storkey,et al.  Exploration by Random Network Distillation , 2018, ICLR.

[41]  Chrystopher L. Nehaniv,et al.  Empowerment: a universal agent-centric measure of control , 2005, 2005 IEEE Congress on Evolutionary Computation.

[42]  Julian Togelius,et al.  Procedural Content Generation: From Automatically Generating Game Levels to Increasing Generality in Machine Learning , 2019, ArXiv.

[43]  Pieter Abbeel,et al.  Reverse Curriculum Generation for Reinforcement Learning , 2017, CoRL.

[44]  Tim Rocktäschel,et al.  RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments , 2020, ICLR.

[45]  Pushmeet Kohli,et al.  Learning to Understand Goal Specifications by Modelling Reward , 2018, ICLR.

[46]  Pierre-Yves Oudeyer,et al.  Language Grounding through Social Interactions and Curiosity-Driven Multi-Goal Learning , 2019, ViGIL@NeurIPS.

[47]  Edward Grefenstette,et al.  TorchBeast: A PyTorch Platform for Distributed RL , 2019, ArXiv.

[48]  John Schulman,et al.  Teacher–Student Curriculum Learning , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[49]  L. Schulz Finding new facts; thinking new thoughts. , 2012, Advances in child development and behavior.

[50]  Marc Pollefeys,et al.  Episodic Curiosity through Reachability , 2018, ICLR.