Persistent Reinforcement Learning via Subgoal Curricula

Reinforcement learning (RL) promises to enable autonomous acquisition of complex behaviors for diverse agents. However, the success of current reinforcement learning algorithms is predicated on an often under-emphasised requirement – each trial needs to start from a fixed initial state distribution. Unfortunately, resetting the environment to its initial state after each trial requires substantial amount of human supervision and extensive instrumentation of the environment which defeats the purpose of autonomous reinforcement learning. In this work, we propose Value-accelerated Persistent Reinforcement Learning (VaPRL), which generates a curriculum of initial states such that the agent can bootstrap on the success of easier tasks to efficiently learn harder tasks. The agent also learns to reach the initial states proposed by the curriculum, minimizing the reliance on human interventions into the learning. We observe that VaPRL reduces the interventions required by three orders of magnitude compared to episodic RL while outperforming prior state-of-the art methods for reset-free RL both in terms of sample efficiency and asymptotic performance on a variety of simulated robotics problems1.

[1]  Leslie Pack Kaelbling,et al.  Learning to Achieve Goals , 1993, IJCAI.

[2]  John Langford,et al.  Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.

[3]  S. Shankar Sastry,et al.  Autonomous Helicopter Flight via Reinforcement Learning , 2003, NIPS.

[4]  Peter Stone,et al.  Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[5]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[6]  Pieter Abbeel,et al.  Safe Exploration in Markov Decision Processes , 2012, ICML.

[7]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[8]  Sergey Levine,et al.  Learning compound multi-step controllers under unknown dynamics , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[9]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.

[10]  Javier García,et al.  A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[11]  Jitendra Malik,et al.  Learning to Poke by Poking: Experiential Learning of Intuitive Physics , 2016, NIPS.

[12]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[13]  Sergey Levine,et al.  Deep spatial autoencoders for visuomotor learning , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Abhinav Gupta,et al.  Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[15]  Gaurav S. Sukhatme,et al.  Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning , 2017, ICML.

[16]  James Davidson,et al.  TensorFlow Agents: Efficient Batched Reinforcement Learning in TensorFlow , 2017, ArXiv.

[17]  Andreas Krause,et al.  Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.

[18]  Sergey Levine,et al.  Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[19]  Pieter Abbeel,et al.  Reverse Curriculum Generation for Reinforcement Learning , 2017, CoRL.

[20]  Danica Kragic,et al.  Deep predictive policy training using reinforcement learning , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[21]  Sergey Levine,et al.  Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[22]  Sergey Levine,et al.  Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning , 2017, ICLR.

[23]  Pieter Abbeel,et al.  Automatic Goal Generation for Reinforcement Learning Agents , 2017, ICML.

[24]  Ofir Nachum,et al.  A Lyapunov-based Approach to Safe Reinforcement Learning , 2018, NeurIPS.

[25]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[26]  Ilya Kostrikov,et al.  Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play , 2017, ICLR.

[27]  Sergey Levine,et al.  Temporal Difference Models: Model-Free Deep RL for Model-Based Control , 2018, ICLR.

[28]  Sergey Levine,et al.  QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[29]  S. Levine,et al.  Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning , 2019, CoRL.

[30]  Sergey Levine,et al.  Learning to Walk via Deep Reinforcement Learning , 2018, Robotics: Science and Systems.

[31]  Alberto Rodriguez,et al.  TossingBot: Learning to Throw Arbitrary Objects With Residual Physics , 2019, IEEE Transactions on Robotics.

[32]  Henry Zhu,et al.  Dexterous Manipulation with Deep Reinforcement Learning: Efficient, General, and Low-Cost , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[33]  Sergey Levine,et al.  Deep Dynamics Models for Learning Dexterous Manipulation , 2019, CoRL.

[34]  Peter Stone,et al.  Learning Curriculum Policies for Reinforcement Learning , 2018, AAMAS.

[35]  S. Levine,et al.  Learning to Walk in the Real World with Minimal Human Effort , 2020, CoRL.

[36]  Sergey Levine,et al.  The Ingredients of Real-World Robotic Reinforcement Learning , 2020, ICLR.

[37]  S. Levine,et al.  Gradient Surgery for Multi-Task Learning , 2020, NeurIPS.

[38]  Karol Hausman,et al.  Emergent Real-World Robotic Skills via Unsupervised Off-Policy Reinforcement Learning , 2020, Robotics: Science and Systems.

[39]  Sergey Levine,et al.  Continual Learning of Control Primitives: Skill Discovery via Reset-Games , 2020, NeurIPS.

[40]  John Schulman,et al.  Teacher–Student Curriculum Learning , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[41]  P. Abbeel,et al.  Reset-Free Lifelong Learning with Skill-Space Planning , 2020, ICLR.

[42]  Sergey Levine,et al.  Reset-Free Reinforcement Learning via Multi-Task Learning: Learning Dexterous Manipulation Behaviors without Human Intervention , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).