论文信息 - Automated curricula through setter-solver interactions - 字舞流文

Automated curricula through setter-solver interactions

Reinforcement learning algorithms use correlations between policies and rewards to improve agent performance. But in dynamic or sparsely rewarding environments these correlations are often too small, or rewarding events are too infrequent to make learning feasible. Human education instead relies on curricula--the breakdown of tasks into simpler, static challenges with dense rewards--to build up to complex behaviors. While curricula are also useful for artificial agents, hand-crafting them is time consuming. This has lead researchers to explore automatic curriculum generation. Here we explore automatic curriculum generation in rich, dynamic environments. Using a setter-solver paradigm we show the importance of considering goal validity, goal feasibility, and goal coverage to construct useful curricula. We demonstrate the success of our approach in rich but sparsely rewarding 2D and 3D environments, where an agent is tasked to achieve a single goal selected from a set of possible goals that varies between episodes, and identify challenges for future work. Finally, we demonstrate the value of a novel technique that guides agents towards a desired goal distribution. Altogether, these results represent a substantial step towards applying automatic task curricula to learn complex, otherwise unlearnable goals, and to our knowledge are the first to demonstrate automated curriculum generation for goal-conditioned agents in environments where the possible goals vary between episodes.

Andrew Kyle Lampinen | David P. Reichert | T. Lillicrap | Adam Santoro | Sébastien Racanière | Vlad Firoiu | S. Racanière

[1] Pierre Baldi,et al. Solving the Rubik’s cube with deep reinforcement learning and search , 2019, Nat. Mach. Intell..

[2] Suman V. Ravuri,et al. Classification Accuracy Score for Conditional Generative Models , 2019, NeurIPS.

[3] Ole Winther,et al. BIVA: A Very Deep Hierarchy of Latent Variables for Generative Modeling , 2019, NeurIPS.

[4] Rui Wang,et al. Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions , 2019, ArXiv.

[5] Peter Stone,et al. Learning Curriculum Policies for Reinforcement Learning , 2018, AAMAS.

[6] Razvan Pascanu,et al. Deep reinforcement learning with relational inductive biases , 2018, ICLR.

[7] Jeff Donahue,et al. Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[8] Alexei A. Efros,et al. Large-Scale Study of Curiosity-Driven Learning , 2018, ICLR.

[9] Guy Lever,et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning , 2018, Science.

[10] Sergey Levine,et al. Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.

[11] Fabio Viola,et al. Taming VAEs , 2018, ArXiv.

[12] Sergey Levine,et al. Visual Reinforcement Learning with Imagined Goals , 2018, NeurIPS.

[13] Daniel L. K. Yamins,et al. Learning to Play with Intrinsically-Motivated Self-Aware Agents , 2018, NeurIPS.

[14] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[15] Pieter Abbeel,et al. Automatic Goal Generation for Reinforcement Learning Agents , 2017, ICML.

[16] Ilya Kostrikov,et al. Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play , 2017, ICLR.

[17] Demis Hassabis,et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[18] Razvan Pascanu,et al. Imagination-Augmented Agents for Deep Reinforcement Learning , 2017, NIPS.

[19] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.

[20] Alex Graves,et al. Automated Curriculum Learning for Neural Networks , 2017, ICML.

[21] Léon Bottou,et al. Wasserstein GAN , 2017, ArXiv.

[22] Daan Wierstra,et al. Variational Intrinsic Control , 2016, ICLR.

[23] Samy Bengio,et al. Density estimation using Real NVP , 2016, ICLR.

[24] Sergio Gomez Colmenarejo,et al. Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[25] Daan Wierstra,et al. Towards Conceptual Compression , 2016, NIPS.

[26] Ole Winther,et al. Ladder Variational Autoencoders , 2016, NIPS.

[27] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[28] Jürgen Schmidhuber,et al. Training Very Deep Networks , 2015, NIPS.

[29] Simon Osindero,et al. Conditional Generative Adversarial Nets , 2014, ArXiv.

[30] Wojciech Zaremba,et al. Learning to Execute , 2014, ArXiv.

[31] Pierre-Yves Oudeyer,et al. Active learning of inverse models with intrinsically motivated goal exploration in robots , 2013, Robotics Auton. Syst..

[32] Jason Weston,et al. Curriculum learning , 2009, ICML '09.

[33] Cynthia A. Brewer,et al. ColorBrewer.org: An Online Tool for Selecting Colour Schemes for Maps , 2003 .

[34] J. Elman. Learning and development in neural networks: the importance of starting small , 1993, Cognition.