CURIOUS: Intrinsically Motivated Modular Multi-Goal Reinforcement Learning

In open-ended environments, autonomous learning agents must set their own goals and build their own curriculum through an intrinsically motivated exploration. They may consider a large diversity of goals, aiming to discover what is controllable in their environments, and what is not. Because some goals might prove easy and some impossible, agents must actively select which goal to practice at any moment, to maximize their overall mastery on the set of learnable goals. This paper proposes CURIOUS, an algorithm that leverages 1) a modular Universal Value Function Approximator with hindsight learning to achieve a diversity of goals of different kinds within a unique policy and 2) an automated curriculum learning mechanism that biases the attention of the agent towards goals maximizing the absolute learning progress. Agents focus sequentially on goals of increasing complexity, and focus back on goals that are being forgotten. Experiments conducted in a new modular-goal robotic environment show the resulting developmental self-organization of a learning curriculum, and demonstrate properties of robustness to distracting goals, forgetting and changes in body properties.

[1]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[2]  Satinder Singh,et al.  Many-Goals Reinforcement Learning , 2018, ArXiv.

[3]  Wojciech Czarnecki,et al.  Multi-task Deep Reinforcement Learning with PopArt , 2018, AAAI.

[4]  Pierre-Yves Oudeyer,et al.  Modular active curiosity-driven discovery of tool use , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[5]  Martin A. Riedmiller,et al.  Learning by Playing - Solving Sparse Reward Tasks from Scratch , 2018, ICML.

[6]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[7]  Volker Tresp,et al.  Energy-Based Hindsight Experience Prioritization , 2018, CoRL.

[8]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[9]  Sergey Levine,et al.  Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.

[10]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.

[11]  Pieter Abbeel,et al.  Automatic Goal Generation for Reinforcement Learning Agents , 2017, ICML.

[12]  Kate Saenko,et al.  Hierarchical Reinforcement Learning with Hindsight , 2018, ArXiv.

[13]  Pierre-Yves Oudeyer,et al.  GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms , 2017, ICML.

[14]  Pierre-Yves Oudeyer,et al.  Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning , 2017, J. Mach. Learn. Res..

[15]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[16]  Pierre-Yves Oudeyer,et al.  Active choice of teachers, learning strategies and goals for a socially guided intrinsic motivation learner , 2012, Paladyn J. Behav. Robotics.

[17]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[18]  Leslie Pack Kaelbling,et al.  Learning to Achieve Goals , 1993, IJCAI.

[19]  Yee Whye Teh,et al.  Distral: Robust multitask reinforcement learning , 2017, NIPS.

[20]  Martin A. Riedmiller,et al.  Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards , 2017, ArXiv.

[21]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[22]  Tom Schaul,et al.  Unicorn: Continual Learning with a Universal, Off-policy Agent , 2018, ArXiv.

[23]  Tom Schaul,et al.  FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[24]  Pierre-Yves Oudeyer,et al.  Curiosity Driven Exploration of Learned Disentangled Goal Spaces , 2018, CoRL.

[25]  Pierre-Yves Oudeyer,et al.  How Evolution May Work Through Curiosity-Driven Developmental Process , 2016, Top. Cogn. Sci..

[26]  Jürgen Schmidhuber,et al.  Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[27]  Pierre-Yves Oudeyer,et al.  Active learning of inverse models with intrinsically motivated goal exploration in robots , 2013, Robotics Auton. Syst..

[28]  Pierre-Yves Oudeyer,et al.  Maximizing Learning Progress: An Internal Reward System for Development , 2003, Embodied Artificial Intelligence.

[29]  Marcin Andrychowicz,et al.  Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research , 2018, ArXiv.

[30]  Karl J. Friston,et al.  A Bayesian Foundation for Individual Learning Under Uncertainty , 2011, Front. Hum. Neurosci..