论文信息 - A Performance-Based Start State Curriculum Framework for Reinforcement Learning

A Performance-Based Start State Curriculum Framework for Reinforcement Learning

Sparse reward problems present a challenge for reinforcement learning (RL) agents. Previous work has shown that choosing start states according to a curriculum can significantly improve the learning performance. We observe that many existing curriculum generation algorithms rely on two key components: Performance measure estimation and a start selection policy. Therefore, we propose a unifying framework for performance-based start state curricula in RL, which allows to analyze and compare the performance influence of the two key components. Furthermore, a new start state selection policy using spatial performance measure gradients is introduced. We conduct extensive empirical evaluations to compare performance-based start state curricula and investigate the influence of performance measure model choice and estimation. Benchmarking on difficult robotic navigation tasks and a high-dimensional robotic manipulation task, we demonstrate state-of-the-art performance of our novel spatial gradient curriculum.

[1] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[2] Pieter Abbeel,et al. Automatic Goal Generation for Reinforcement Learning Agents , 2017, ICML.

[3] Joan Bruna,et al. Backplay: "Man muss immer umkehren" , 2018, ArXiv.

[4] Filip De Turck,et al. VIME: Variational Information Maximizing Exploration , 2016, NIPS.

[5] Shiguang Shan,et al. Self-Paced Curriculum Learning , 2015, AAAI.

[6] Jason Weston,et al. Curriculum learning , 2009, ICML '09.

[7] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.

[8] Pierre-Yves Oudeyer,et al. R-IAC: Robust Intrinsically Motivated Exploration and Active Learning , 2009, IEEE Transactions on Autonomous Mental Development.

[9] Pieter Abbeel,et al. Reverse Curriculum Generation for Reinforcement Learning , 2017, CoRL.

[10] Demis Hassabis,et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[11] Daniel L. K. Yamins,et al. Learning to Play with Intrinsically-Motivated Self-Aware Agents , 2018, NeurIPS.

[12] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[13] Tim Salimans,et al. Learning Montezuma's Revenge from a Single Demonstration , 2018, ArXiv.

[14] Martin A. Riedmiller,et al. Learning by Playing - Solving Sparse Reward Tasks from Scratch , 2018, ICML.

[15] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[16] Peter Stone,et al. Learning Curriculum Policies for Reinforcement Learning , 2018, AAMAS.

[17] Ilya Kostrikov,et al. Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play , 2017, ICLR.

[18] Zhang-Wei Hong,et al. Diversity-Driven Exploration Strategy for Deep Reinforcement Learning , 2018, NeurIPS.

[19] Jan Peters,et al. Self-Paced Contextual Reinforcement Learning , 2019, CoRL.

[20] Pierre-Yves Oudeyer,et al. Active learning of inverse models with intrinsically motivated goal exploration in robots , 2013, Robotics Auton. Syst..

[21] Jürgen Schmidhuber,et al. Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[22] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.

[23] Yuval Tassa,et al. Data-efficient Deep Reinforcement Learning for Dexterous Manipulation , 2017, ArXiv.

[24] Sergey Levine,et al. Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.

[25] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.

[26] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[27] Alex Graves,et al. Automated Curriculum Learning for Neural Networks , 2017, ICML.

[28] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[29] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[30] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[31] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[32] Jürgen Schmidhuber,et al. PowerPlay: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem , 2011, Front. Psychol..

[33] Daphne Koller,et al. Self-Paced Learning for Latent Variable Models , 2010, NIPS.

[34] Peter Stone,et al. Autonomous Task Sequencing for Customized Curriculum Design in Reinforcement Learning , 2017, IJCAI.

[35] Jürgen Schmidhuber,et al. Continually adding self-invented problems to the repertoire: First experiments with POWERPLAY , 2012, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[36] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[37] Minoru Asada,et al. Purposive Behavior Acquisition for a Real Robot by Vision-Based Reinforcement Learning , 2005, Machine Learning.

[38] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[39] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[40] Sepp Hochreiter,et al. RUDDER: Return Decomposition for Delayed Rewards , 2018, NeurIPS.

[41] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.