论文信息 - Reset-Free Lifelong Learning with Skill-Space Planning - 字舞流文

Reset-Free Lifelong Learning with Skill-Space Planning

The objective of \textit{lifelong} reinforcement learning (RL) is to optimize agents which can continuously adapt and interact in changing environments. However, current RL approaches fail drastically when environments are non-stationary and interactions are non-episodic. We propose \textit{Lifelong Skill Planning} (LiSP), an algorithmic framework for lifelong RL based on planning in an abstract space of higher-order skills. We learn the skills in an unsupervised manner using intrinsic rewards and plan over the learned skills using a learned dynamics model. Moreover, our framework permits skill discovery even from offline data, thereby reducing the need for excessive real-world interactions. We demonstrate empirically that LiSP successfully enables long-horizon planning and learns agents that can avoid catastrophic failures even in challenging non-stationary and non-episodic environments derived from gridworld and MuJoCo benchmarks.

Kevin Lu | Pieter Abbeel | Aditya Grover | Igor Mordatch

[1] Pieter Abbeel,et al. Variational Option Discovery Algorithms , 2018, ArXiv.

[2] Jordi Torres,et al. Explore, Discover and Learn: Unsupervised Discovery of State-Covering Skills , 2020, ICML.

[3] Sergey Levine,et al. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.

[4] Yishay Mansour,et al. Reinforcement Learning in POMDPs Without Resets , 2005, IJCAI.

[5] Sergey Levine,et al. Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning , 2017, ICLR.

[6] Sergey Levine,et al. Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.

[7] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[8] Sergey Levine,et al. Learning compound multi-step controllers under unknown dynamics , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[9] David Warde-Farley,et al. Fast Task Inference with Variational Intrinsic Successor Features , 2019, ICLR.

[10] Yann LeCun,et al. Model-Predictive Policy Learning with Uncertainty Regularization for Driving in Dense Traffic , 2019, ICLR.

[11] Doina Precup,et al. Options of Interest: Temporal Abstraction with Interest Functions , 2020, AAAI.

[12] George Tucker,et al. Conservative Q-Learning for Offline Reinforcement Learning , 2020, NeurIPS.

[13] Pieter Abbeel,et al. Model-Ensemble Trust-Region Policy Optimization , 2018, ICLR.

[14] Sergey Levine,et al. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.

[15] Sham M. Kakade,et al. Plan Online, Learn Offline: Efficient Learning and Exploration via Model-Based Control , 2018, ICLR.

[16] Pieter Abbeel,et al. Adaptive Online Planning for Continual Lifelong Learning , 2019, ArXiv.

[17] Pieter Abbeel,et al. Prediction and Control with Temporal Segment Models , 2017, ICML.

[18] Sergey Levine,et al. Why Does Hierarchy (Sometimes) Work So Well in Reinforcement Learning? , 2019, ArXiv.

[19] Sergey Levine,et al. Deep Online Learning via Meta-Learning: Continual Adaptation for Model-Based RL , 2018, ICLR.

[20] Sergey Levine,et al. Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[21] Sergey Levine,et al. Deep Dynamics Models for Learning Dexterous Manipulation , 2019, CoRL.

[22] Yee Whye Teh,et al. Progress & Compress: A scalable framework for continual learning , 2018, ICML.

[23] Karol Hausman,et al. Emergent Real-World Robotic Skills via Unsupervised Off-Policy Reinforcement Learning , 2020, Robotics: Science and Systems.

[24] Martial Hebert,et al. Improving Multi-Step Prediction of Learned Time Series Models , 2015, AAAI.

[25] Jimmy Ba,et al. Exploring Model-based Planning with Policy Networks , 2019, ICLR.

[26] David Warde-Farley,et al. Unsupervised Control Through Non-Parametric Discriminative Rewards , 2018, ICLR.

[27] Evangelos Theodorou,et al. Model Predictive Path Integral Control using Covariance Variable Importance Sampling , 2015, ArXiv.

[28] Lantao Yu,et al. MOPO: Model-based Offline Policy Optimization , 2020, NeurIPS.

[29] Doina Precup,et al. The Option-Critic Architecture , 2016, AAAI.

[30] Sergey Levine,et al. Ecological Reinforcement Learning , 2020, ArXiv.

[31] Sergey Levine,et al. Online Meta-Learning , 2019, ICML.

[32] Daan Wierstra,et al. Variational Intrinsic Control , 2016, ICLR.

[33] Sergey Levine,et al. Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.

[34] Razvan Pascanu,et al. Progressive Neural Networks , 2016, ArXiv.

[35] Yifan Wu,et al. Behavior Regularized Offline Reinforcement Learning , 2019, ArXiv.

[36] Mohammad Norouzi,et al. An Optimistic Perspective on Offline Reinforcement Learning , 2020, ICML.

[37] Arthur Argenson,et al. Model-Based Offline Planning , 2021, ICLR 2021 Poster.

[38] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.

[39] Katja Hofmann,et al. The MineRL Competition on Sample Efficient Reinforcement Learning using Human Priors , 2019, ArXiv.

[40] Ruben Villegas,et al. Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[41] Sergey Levine,et al. Uncertainty-Aware Reinforcement Learning for Collision Avoidance , 2017, ArXiv.

[42] OctoMiao. Overcoming catastrophic forgetting in neural networks , 2016 .

[43] J. A. Walker,et al. The general problem of the stability of motion , 1994 .

[44] David Rolnick,et al. Experience Replay for Continual Learning , 2018, NeurIPS.

[45] Christoph Salge,et al. Empowerment - an Introduction , 2013, ArXiv.

[46] Sergey Levine,et al. Dynamics-Aware Unsupervised Discovery of Skills , 2019, ICLR.

[47] David Held,et al. Learning Off-Policy with Online Planning , 2020, CoRL.

[48] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[49] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[50] Justin Fu,et al. D4RL: Datasets for Deep Data-Driven Reinforcement Learning , 2020, ArXiv.