论文信息 - Adaptive Procedural Task Generation for Hard-Exploration Problems

Adaptive Procedural Task Generation for Hard-Exploration Problems

We introduce Adaptive Procedural Task Generation (APT-Gen), an approach for progressively generating a sequence of tasks as curricula to facilitate reinforcement learning in hard-exploration problems. At the heart of our approach, a task generator learns to create tasks via a black-box procedural generation module by adaptively sampling from the parameterized task space. To enable curriculum learning in the absence of a direct indicator of learning progress, the task generator is trained by balancing the agent's expected return in the generated tasks and their similarities to the target task. Through adversarial training, the similarity between the generated tasks and the target task is adaptively estimated by a task discriminator defined on the agent's behaviors. In this way, our approach can efficiently generate tasks of rich variations for target tasks of unknown parameterization or not covered by the predefined task space. Experiments demonstrate the effectiveness of our approach through quantitative and qualitative analysis in various scenarios.

[1] Razvan Pascanu,et al. Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[2] Peter Stone,et al. Learning Curriculum Policies for Reinforcement Learning , 2018, AAMAS.

[3] Silvio Savarese,et al. Learning task-oriented grasping for tool manipulation from simulated self-supervision , 2018, Robotics: Science and Systems.

[4] Pierre-Yves Oudeyer,et al. Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning , 2017, J. Mach. Learn. Res..

[5] Andrew L. Maas. Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[6] Richard Tanburn,et al. Making Efficient Use of Demonstrations to Solve Hard Exploration Problems , 2019, ICLR.

[7] Yee Whye Teh,et al. Mix&Match - Agent Curricula for Reinforcement Learning , 2018, ICML.

[8] Peter Stone,et al. Automatic Curriculum Graph Generation for Reinforcement Learning Agents , 2017, AAAI.

[9] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[10] Daniel Guo,et al. Agent57: Outperforming the Atari Human Benchmark , 2020, ICML.

[11] Andrew K. Lampinen,et al. Automated curriculum generation through setter-solver interactions , 2020, ICLR.

[12] Julian Togelius,et al. Procedural Content Generation via Machine Learning (PCGML) , 2017, IEEE Transactions on Games.

[13] Ali Farhadi,et al. AI2-THOR: An Interactive 3D Environment for Visual AI , 2017, ArXiv.

[14] Julian Togelius,et al. Procedural Content Generation: From Automatically Generating Game Levels to Increasing Generality in Machine Learning , 2019, ArXiv.

[15] Leonidas J. Guibas,et al. Situational Fusion of Visual Representation for Visual Navigation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[16] Joel Lehman,et al. Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions , 2020, ICML.

[17] Pieter Abbeel,et al. Automatic Goal Generation for Reinforcement Learning Agents , 2017, ICML.

[18] Sergey Levine,et al. Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.

[19] Soumith Chintala,et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[20] Tom Silver,et al. PDDLGym: Gym Environments from PDDL Problems , 2020, ArXiv.

[21] Julian Togelius,et al. PCGRL: Procedural Content Generation via Reinforcement Learning , 2020, AAAI 2020.

[22] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.

[23] Pierre-Yves Oudeyer,et al. Teacher algorithms for curriculum learning of Deep RL in continuously parameterized environments , 2019, CoRL.

[24] Peter Stone,et al. Autonomous Task Sequencing for Customized Curriculum Design in Reinforcement Learning , 2017, IJCAI.

[25] Amos J. Storkey,et al. Exploration by Random Network Distillation , 2018, ICLR.

[26] Filip De Turck,et al. VIME: Variational Information Maximizing Exploration , 2016, NIPS.

[27] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[28] Matthew E. Taylor,et al. Curriculum Design for Machine Learners in Sequential Decision Tasks , 2017, IEEE Transactions on Emerging Topics in Computational Intelligence.

[29] Sergey Levine,et al. Unsupervised Meta-Learning for Reinforcement Learning , 2018, ArXiv.

[30] Kenneth O. Stanley,et al. Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents , 2017, NeurIPS.

[31] Herke van Hoof,et al. A Performance-Based Start State Curriculum Framework for Reinforcement Learning , 2020, AAMAS.

[32] Allan Jabri,et al. Unsupervised Curricula for Visual Meta-Reinforcement Learning , 2019, NeurIPS.

[33] Yi Sun,et al. Planning to Be Surprised: Optimal Bayesian Exploration in Dynamic Environments , 2011, AGI.

[34] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[35] Rob Fergus,et al. Learning Goal Embeddings via Self-Play for Hierarchical Reinforcement Learning , 2018, ArXiv.

[36] Dimitri P. Bertsekas,et al. Nonlinear Programming , 1997 .

[37] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[38] Nando de Freitas,et al. Reinforcement and Imitation Learning for Diverse Visuomotor Skills , 2018, Robotics: Science and Systems.

[39] K. Schittkowski,et al. NONLINEAR PROGRAMMING , 2022 .

[40] David Held,et al. Adaptive Auxiliary Task Weighting for Reinforcement Learning , 2019, NeurIPS.

[41] Marcin Andrychowicz,et al. Overcoming Exploration in Reinforcement Learning with Demonstrations , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[42] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[43] Tim Salimans,et al. Learning Montezuma's Revenge from a Single Demonstration , 2018, ArXiv.

[44] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[45] Jason Weston,et al. Curriculum learning , 2009, ICML '09.

[46] Martin A. Riedmiller,et al. Learning by Playing - Solving Sparse Reward Tasks from Scratch , 2018, ICML.

[47] Sergey Levine,et al. Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning , 2019, CoRL.

[48] Julian Togelius,et al. Procedural Content Generation through Quality Diversity , 2019, 2019 IEEE Conference on Games (CoG).

[49] Marcin Andrychowicz,et al. Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.

[50] Tim Rocktäschel,et al. RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments , 2020, ICLR.

[51] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[52] John Schulman,et al. Leveraging Procedural Generation to Benchmark Reinforcement Learning , 2019, ICML.

[53] Christopher Joseph Pal,et al. Active Domain Randomization , 2019, CoRL.

[54] Alex Graves,et al. Automated Curriculum Learning for Neural Networks , 2017, ICML.

[55] John Schulman,et al. Teacher–Student Curriculum Learning , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[56] Martin A. Riedmiller,et al. Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards , 2017, ArXiv.

[57] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[58] Shie Mannor,et al. How hard is my MDP?" The distribution-norm to the rescue" , 2014, NIPS.

[59] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[60] Ce Liu,et al. Deep Convolutional Neural Network for Image Deconvolution , 2014, NIPS.

[61] Andrew J. Davison,et al. RLBench: The Robot Learning Benchmark & Learning Environment , 2019, IEEE Robotics and Automation Letters.

[62] Jitendra Malik,et al. Gibson Env: Real-World Perception for Embodied Agents , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[63] Jitendra Malik,et al. Habitat: A Platform for Embodied AI Research , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[64] Pieter Abbeel,et al. Reverse Curriculum Generation for Reinforcement Learning , 2017, CoRL.

[65] Kenneth O. Stanley,et al. Go-Explore: a New Approach for Hard-Exploration Problems , 2019, ArXiv.

[66] Ilya Kostrikov,et al. Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play , 2017, ICLR.

[67] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[68] Kenneth O. Stanley,et al. POET: open-ended coevolution of environments and their optimized solutions , 2019, GECCO.

[69] Nando de Freitas,et al. Playing hard exploration games by watching YouTube , 2018, NeurIPS.

[70] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[71] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.