Curiosity-Driven Multi-Criteria Hindsight Experience Replay

Author(s): Lanier, John Banister | Advisor(s): Baldi, Pierre | Abstract: Dealing with sparse rewards is a longstanding challenge in reinforcement learning. The recent use of hindsight methods have achieved success on a variety of sparse-reward tasks, but they fail on complex tasks such as stacking multiple blocks with a robot arm in simulation. Curiosity-driven exploration using the prediction error of a learned dynamics model as an intrinsic reward has been shown to be effective for exploring a number of sparse-reward environments. We present a method that combines hindsight with curiosity-driven exploration and curriculum learning in order to solve the challenging sparse-reward block stacking task. We are the first to stack more than two blocks using only sparse reward without human demonstrations.

[1]  Tom Schaul,et al.  Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[2]  Alexei A. Efros,et al.  Large-Scale Study of Curiosity-Driven Learning , 2018, ICLR.

[3]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[4]  Pierre-Yves Oudeyer,et al.  Accuracy-based Curriculum Learning in Deep Reinforcement Learning , 2018, ArXiv.

[5]  Friedrich T. Sommer,et al.  Learning and exploration in action-perception loops , 2013, Front. Neural Circuits.

[6]  Chrystopher L. Nehaniv,et al.  Empowerment: a universal agent-centric measure of control , 2005, 2005 IEEE Congress on Evolutionary Computation.

[7]  Volker Tresp,et al.  Energy-Based Hindsight Experience Prioritization , 2018, CoRL.

[8]  Jürgen Schmidhuber,et al.  Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[9]  Martin A. Riedmiller,et al.  Learning by Playing - Solving Sparse Reward Tasks from Scratch , 2018, ICML.

[10]  Daniel L. K. Yamins,et al.  Learning to Play with Intrinsically-Motivated Self-Aware Agents , 2018, NeurIPS.

[11]  Marcin Andrychowicz,et al.  Parameter Space Noise for Exploration , 2017, ICLR.

[12]  Martin A. Riedmiller,et al.  Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards , 2017, ArXiv.

[13]  Ameet Deshpande,et al.  Improvements on Hindsight Learning , 2018, ArXiv.

[14]  Jean-Baptiste Mouret,et al.  Multi-objective Model-based Policy Search for Data-efficient Learning with Sparse Rewards , 2018, CoRL.

[15]  A. P. Hyper-parameters Count-Based Exploration with Neural Density Models , 2017 .

[16]  David Silver,et al.  Learning values across many orders of magnitude , 2016, NIPS.

[17]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[18]  Yuval Tassa,et al.  Data-efficient Deep Reinforcement Learning for Dexterous Manipulation , 2017, ArXiv.

[19]  Pieter Abbeel,et al.  Reverse Curriculum Generation for Reinforcement Learning , 2017, CoRL.

[20]  Stewart W. Wilson,et al.  A Possibility for Implementing Curiosity and Boredom in Model-Building Neural Controllers , 1991 .

[21]  Pierre-Yves Oudeyer,et al.  Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning , 2017, J. Mach. Learn. Res..

[22]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[23]  Volker Tresp,et al.  Curiosity-Driven Experience Prioritization via Density Estimation , 2018, ArXiv.

[24]  Amos J. Storkey,et al.  Exploration by Random Network Distillation , 2018, ICLR.

[25]  Daan Wierstra,et al.  Variational Intrinsic Control , 2016, ICLR.

[26]  Marcin Andrychowicz,et al.  Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research , 2018, ArXiv.

[27]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[28]  Stefan Wermter,et al.  Curriculum goal masking for continuous deep reinforcement learning , 2018, 2019 Joint IEEE 9th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob).

[29]  Pierre-Yves Oudeyer,et al.  Unsupervised Learning of Goal Spaces for Intrinsically Motivated Goal Exploration , 2018, ICLR.

[30]  Pierre-Yves Oudeyer,et al.  Curiosity Driven Exploration of Learned Disentangled Goal Spaces , 2018, CoRL.

[31]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[32]  Filip De Turck,et al.  VIME: Variational Information Maximizing Exploration , 2016, NIPS.

[33]  Filipe Wall Mutz,et al.  Hindsight policy gradients , 2017, ICLR.

[34]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[35]  Kenneth O. Stanley,et al.  Go-Explore: a New Approach for Hard-Exploration Problems , 2019, ArXiv.

[36]  Marcin Andrychowicz,et al.  Overcoming Exploration in Reinforcement Learning with Demonstrations , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[37]  Pierre-Yves Oudeyer,et al.  CURIOUS: Intrinsically Motivated Multi-Task, Multi-Goal Reinforcement Learning , 2018, ICML 2019.

[38]  Ehud Ahissar,et al.  Hierarchical curiosity loops and active sensing , 2012, Neural Networks.

[39]  Shakir Mohamed,et al.  Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning , 2015, NIPS.

[40]  Carl E. Rasmussen,et al.  Learning to Control a Low-Cost Manipulator using Data-Efficient Reinforcement Learning , 2011, Robotics: Science and Systems.

[41]  Pierre Baldi,et al.  Solving the Rubik's Cube with Approximate Policy Iteration , 2018, ICLR.