论文信息 - Curiosity-Driven Multi-Criteria Hindsight Experience Replay

Curiosity-Driven Multi-Criteria Hindsight Experience Replay

Author(s): Lanier, John Banister | Advisor(s): Baldi, Pierre | Abstract: Dealing with sparse rewards is a longstanding challenge in reinforcement learning. The recent use of hindsight methods have achieved success on a variety of sparse-reward tasks, but they fail on complex tasks such as stacking multiple blocks with a robot arm in simulation. Curiosity-driven exploration using the prediction error of a learned dynamics model as an intrinsic reward has been shown to be effective for exploring a number of sparse-reward environments. We present a method that combines hindsight with curiosity-driven exploration and curriculum learning in order to solve the challenging sparse-reward block stacking task. We are the first to stack more than two blocks using only sparse reward without human demonstrations.

[1] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[2] Alexei A. Efros,et al. Large-Scale Study of Curiosity-Driven Learning , 2018, ICLR.

[3] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[4] Pierre-Yves Oudeyer,et al. Accuracy-based Curriculum Learning in Deep Reinforcement Learning , 2018, ArXiv.

[5] Friedrich T. Sommer,et al. Learning and exploration in action-perception loops , 2013, Front. Neural Circuits.

[6] Chrystopher L. Nehaniv,et al. Empowerment: a universal agent-centric measure of control , 2005, 2005 IEEE Congress on Evolutionary Computation.

[7] Volker Tresp,et al. Energy-Based Hindsight Experience Prioritization , 2018, CoRL.

[8] Jürgen Schmidhuber,et al. Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[9] Martin A. Riedmiller,et al. Learning by Playing - Solving Sparse Reward Tasks from Scratch , 2018, ICML.

[10] Daniel L. K. Yamins,et al. Learning to Play with Intrinsically-Motivated Self-Aware Agents , 2018, NeurIPS.

[11] Marcin Andrychowicz,et al. Parameter Space Noise for Exploration , 2017, ICLR.

[12] Martin A. Riedmiller,et al. Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards , 2017, ArXiv.

[13] Ameet Deshpande,et al. Improvements on Hindsight Learning , 2018, ArXiv.

[14] Jean-Baptiste Mouret,et al. Multi-objective Model-based Policy Search for Data-efficient Learning with Sparse Rewards , 2018, CoRL.

[15] A. P. Hyper-parameters. Count-Based Exploration with Neural Density Models , 2017 .

[16] David Silver,et al. Learning values across many orders of magnitude , 2016, NIPS.

[17] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[18] Yuval Tassa,et al. Data-efficient Deep Reinforcement Learning for Dexterous Manipulation , 2017, ArXiv.

[19] Pieter Abbeel,et al. Reverse Curriculum Generation for Reinforcement Learning , 2017, CoRL.

[20] Stewart W. Wilson,et al. A Possibility for Implementing Curiosity and Boredom in Model-Building Neural Controllers , 1991 .

[21] Pierre-Yves Oudeyer,et al. Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning , 2017, J. Mach. Learn. Res..

[22] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[23] Volker Tresp,et al. Curiosity-Driven Experience Prioritization via Density Estimation , 2018, ArXiv.

[24] Amos J. Storkey,et al. Exploration by Random Network Distillation , 2018, ICLR.

[25] Daan Wierstra,et al. Variational Intrinsic Control , 2016, ICLR.

[26] Marcin Andrychowicz,et al. Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research , 2018, ArXiv.

[27] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[28] Stefan Wermter,et al. Curriculum goal masking for continuous deep reinforcement learning , 2018, 2019 Joint IEEE 9th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob).

[29] Pierre-Yves Oudeyer,et al. Unsupervised Learning of Goal Spaces for Intrinsically Motivated Goal Exploration , 2018, ICLR.

[30] Pierre-Yves Oudeyer,et al. Curiosity Driven Exploration of Learned Disentangled Goal Spaces , 2018, CoRL.

[31] Jason Weston,et al. Curriculum learning , 2009, ICML '09.

[32] Filip De Turck,et al. VIME: Variational Information Maximizing Exploration , 2016, NIPS.

[33] Filipe Wall Mutz,et al. Hindsight policy gradients , 2017, ICLR.

[34] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.

[35] Kenneth O. Stanley,et al. Go-Explore: a New Approach for Hard-Exploration Problems , 2019, ArXiv.

[36] Marcin Andrychowicz,et al. Overcoming Exploration in Reinforcement Learning with Demonstrations , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[37] Pierre-Yves Oudeyer,et al. CURIOUS: Intrinsically Motivated Multi-Task, Multi-Goal Reinforcement Learning , 2018, ICML 2019.

[38] Ehud Ahissar,et al. Hierarchical curiosity loops and active sensing , 2012, Neural Networks.

[39] Shakir Mohamed,et al. Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning , 2015, NIPS.

[40] Carl E. Rasmussen,et al. Learning to Control a Low-Cost Manipulator using Data-Efficient Reinforcement Learning , 2011, Robotics: Science and Systems.

[41] Pierre Baldi,et al. Solving the Rubik's Cube with Approximate Policy Iteration , 2018, ICLR.