论文信息 - Automated curricula through setter-solver interactions - 字舞流文

Automated curricula through setter-solver interactions

Reinforcement learning algorithms use correlations between policies and rewards to improve agent performance. But in dynamic or sparsely rewarding environments these correlations are often too small, or rewarding events are too infrequent to make learning feasible. Human education instead relies on curricula--the breakdown of tasks into simpler, static challenges with dense rewards--to build up to complex behaviors. While curricula are also useful for artificial agents, hand-crafting them is time consuming. This has lead researchers to explore automatic curriculum generation. Here we explore automatic curriculum generation in rich, dynamic environments. Using a setter-solver paradigm we show the importance of considering goal validity, goal feasibility, and goal coverage to construct useful curricula. We demonstrate the success of our approach in rich but sparsely rewarding 2D and 3D environments, where an agent is tasked to achieve a single goal selected from a set of possible goals that varies between episodes, and identify challenges for future work. Finally, we demonstrate the value of a novel technique that guides agents towards a desired goal distribution. Altogether, these results represent a substantial step towards applying automatic task curricula to learn complex, otherwise unlearnable goals, and to our knowledge are the first to demonstrate automated curriculum generation for goal-conditioned agents in environments where the possible goals vary between episodes.

Andrew K. Lampinen | Timothy P. Lillicrap | Adam Santoro | David P. Reichert | Vlad Firoiu | Sebastien Racaniere | T. Lillicrap | Adam Santoro | Sébastien Racanière | A. Lampinen | Vlad Firoiu

[1] J. Elman. Learning and development in neural networks: the importance of starting small , 1993, Cognition.

[2] Cynthia A. Brewer,et al. ColorBrewer.org: An Online Tool for Selecting Colour Schemes for Maps , 2003 .

[3] Jason Weston,et al. Curriculum learning , 2009, ICML '09.

[4] Pierre-Yves Oudeyer,et al. Active learning of inverse models with intrinsically motivated goal exploration in robots , 2013, Robotics Auton. Syst..

[5] Wojciech Zaremba,et al. Learning to Execute , 2014, ArXiv.

[6] Simon Osindero,et al. Conditional Generative Adversarial Nets , 2014, ArXiv.

[7] Jürgen Schmidhuber,et al. Training Very Deep Networks , 2015, NIPS.

[8] Ole Winther,et al. Ladder Variational Autoencoders , 2016, NIPS.

[9] Sergio Gomez Colmenarejo,et al. Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[10] Daan Wierstra,et al. Towards Conceptual Compression , 2016, NIPS.

[11] Samy Bengio,et al. Density estimation using Real NVP , 2016, ICLR.

[12] Léon Bottou,et al. Wasserstein GAN , 2017, ArXiv.

[13] Alex Graves,et al. Automated Curriculum Learning for Neural Networks , 2017, ICML.

[14] Demis Hassabis,et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[15] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.

[16] Razvan Pascanu,et al. Imagination-Augmented Agents for Deep Reinforcement Learning , 2017, NIPS.

[17] Daan Wierstra,et al. Variational Intrinsic Control , 2016, ICLR.

[18] Fabio Viola,et al. Taming VAEs , 2018, ArXiv.

[19] Daniel L. K. Yamins,et al. Learning to Play with Intrinsically-Motivated Self-Aware Agents , 2018, NeurIPS.

[20] Sergey Levine,et al. Visual Reinforcement Learning with Imagined Goals , 2018, NeurIPS.

[21] Pieter Abbeel,et al. Automatic Goal Generation for Reinforcement Learning Agents , 2017, ICML.

[22] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[23] Ilya Kostrikov,et al. Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play , 2017, ICLR.

[24] Jeff Donahue,et al. Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[25] Sergey Levine,et al. Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.

[26] Razvan Pascanu,et al. Deep reinforcement learning with relational inductive biases , 2018, ICLR.

[27] Rui Wang,et al. Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions , 2019, ArXiv.

[28] Peter Stone,et al. Learning Curriculum Policies for Reinforcement Learning , 2018, AAMAS.

[29] Alexei A. Efros,et al. Large-Scale Study of Curiosity-Driven Learning , 2018, ICLR.

[30] Suman V. Ravuri,et al. Classification Accuracy Score for Conditional Generative Models , 2019, NeurIPS.

[31] Ole Winther,et al. BIVA: A Very Deep Hierarchy of Latent Variables for Generative Modeling , 2019, NeurIPS.

[32] Guy Lever,et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning , 2018, Science.

[33] Pierre Baldi,et al. Solving the Rubik’s cube with deep reinforcement learning and search , 2019, Nat. Mach. Intell..