Learning with AMIGo: Adversarially Motivated Intrinsic Goals

A key challenge for reinforcement learning (RL) consists of learning in environments with sparse extrinsic rewards. In contrast to current RL methods, humans are able to learn new skills with little or no reward by using various forms of intrinsic motivation. We propose AMIGo, a novel agent incorporating -- as form of meta-learning -- a goal-generating teacher that proposes Adversarially Motivated Intrinsic Goals to train a goal-conditioned "student" policy in the absence of (or alongside) environment reward. Specifically, through a simple but effective "constructively adversarial" objective, the teacher learns to propose increasingly challenging -- yet achievable -- goals that allow the student to learn general skills for acting in a new environment, independent of the task to be solved. We show that our method generates a natural curriculum of self-proposed goals which ultimately allows the agent to solve challenging procedurally-generated tasks where other forms of intrinsic motivation and state-of-the-art RL methods fail.

[1]  Tom Schaul,et al.  Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[2]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[3]  Lei Han,et al.  Curriculum-guided Hindsight Experience Replay , 2019, NeurIPS.

[4]  Pieter Abbeel,et al.  Reverse Curriculum Generation for Reinforcement Learning , 2017, CoRL.

[5]  Pieter Abbeel,et al.  Automatic Goal Generation for Reinforcement Learning Agents , 2017, ICML.

[6]  John Foley,et al.  ToyBox: Better Atari Environments for Testing Reinforcement Learning Agents , 2018, ArXiv.

[7]  Chrystopher L. Nehaniv,et al.  Empowerment: a universal agent-centric measure of control , 2005, 2005 IEEE Congress on Evolutionary Computation.

[8]  Edward Grefenstette,et al.  TorchBeast: A PyTorch Platform for Distributed RL , 2019, ArXiv.

[9]  Pierre-Yves Oudeyer,et al.  Language Grounding through Social Interactions and Curiosity-Driven Multi-Goal Learning , 2019, ViGIL@NeurIPS.

[10]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[11]  Jivko Sinapov,et al.  Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey , 2020, J. Mach. Learn. Res..

[12]  Marc G. Bellemare,et al.  Count-Based Exploration with Neural Density Models , 2017, ICML.

[13]  Pierre-Yves Oudeyer,et al.  What is Intrinsic Motivation? A Typology of Computational Approaches , 2007, Frontiers Neurorobotics.

[14]  Pierre-Yves Oudeyer,et al.  Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[15]  Sergey Levine,et al.  Skew-Fit: State-Covering Self-Supervised Reinforcement Learning , 2019, ICML.

[16]  Pierre-Yves Oudeyer,et al.  Teacher algorithms for curriculum learning of Deep RL in continuously parameterized environments , 2019, CoRL.

[17]  Tim Rocktäschel,et al.  RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments , 2020, ICLR.

[18]  Sham M. Kakade,et al.  Towards Generalization and Simplicity in Continuous Control , 2017, NIPS.

[19]  Jürgen Schmidhuber,et al.  A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .

[20]  Richard L. Lewis,et al.  Internal Rewards Mitigate Agent Boundedness , 2010, ICML.

[21]  Sergey Levine,et al.  Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models , 2015, ArXiv.

[22]  Michael L. Littman,et al.  An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..

[23]  Ilya Kostrikov,et al.  Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play , 2017, ICLR.

[24]  Sebastian Risi,et al.  Behind DeepMind’s AlphaStar AI that Reached Grandmaster Level in StarCraft II , 2020, KI - Künstliche Intelligenz.

[25]  Julian Togelius,et al.  Rotation, Translation, and Cropping for Zero-Shot Generalization , 2020, 2020 IEEE Conference on Games (CoG).

[26]  Joelle Pineau,et al.  A Dissection of Overfitting and Generalization in Continuous Reinforcement Learning , 2018, ArXiv.

[27]  Pieter Abbeel,et al.  Automatic Curriculum Learning through Value Disagreement , 2020, NeurIPS.

[28]  Satinder Singh,et al.  On Learning Intrinsic Rewards for Policy Gradient Methods , 2018, NeurIPS.

[29]  Edward Grefenstette,et al.  RTFM: Generalising to Novel Environment Dynamics via Reading , 2020, ICLR.

[30]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[31]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[32]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[33]  Edward Grefenstette,et al.  The NetHack Learning Environment , 2020, NeurIPS.

[34]  Andrew K. Lampinen,et al.  Automated curricula through setter-solver interactions , 2019, ArXiv.

[35]  Richard L. Lewis,et al.  Where Do Rewards Come From , 2009 .

[36]  Andrew G. Barto,et al.  Intrinsic Motivation and Reinforcement Learning , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[37]  Amos J. Storkey,et al.  Exploration by Random Network Distillation , 2018, ICLR.

[38]  John Schulman,et al.  Leveraging Procedural Generation to Benchmark Reinforcement Learning , 2019, ICML.

[39]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[40]  Marlos C. Machado,et al.  Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..

[41]  Kevin Waugh,et al.  DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker , 2017, ArXiv.

[42]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[43]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[44]  Pierre-Yves Oudeyer,et al.  Automatic Curriculum Learning For Deep RL: A Short Survey , 2020, IJCAI.

[45]  Jürgen Schmidhuber,et al.  PowerPlay: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem , 2011, Front. Psychol..

[46]  Michael J. Watts,et al.  IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS Publication Information , 2020, IEEE Transactions on Neural Networks and Learning Systems.