Adaptive Behavior Cloning Regularization for Stable Offline-to-Online Reinforcement Learning
暂无分享,去创建一个
Juho Kannala | J. Pajarinen | Rinu Boney | A. Ilin | Yi Zhao
[1] Pieter Abbeel,et al. Offline-to-Online Reinforcement Learning via Balanced Replay and Pessimistic Q-Ensemble , 2021, CoRL.
[2] Scott Fujimoto,et al. A Minimalist Approach to Offline Reinforcement Learning , 2021, NeurIPS.
[3] Ilya Kostrikov,et al. Offline Reinforcement Learning with Fisher Divergence Critic Regularization , 2021, ICML.
[4] Ofir Nachum,et al. Representation Matters: Offline Pretraining for Sequential Decision Making , 2021, ICML.
[5] Che Wang,et al. Randomized Ensembled Double Q-Learning: Learning Fast Without a Model , 2021, ICLR.
[6] Sergey Levine,et al. COG: Connecting New Skills to Past Experience with Offline Reinforcement Learning , 2020, ArXiv.
[7] S. Levine,et al. OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning , 2020, ICLR.
[8] Gabriel Dulac-Arnold,et al. Model-Based Offline Planning , 2020, ICLR.
[9] S. Levine,et al. Accelerating Online Reinforcement Learning with Offline Datasets , 2020, ArXiv.
[10] S. Levine,et al. Conservative Q-Learning for Offline Reinforcement Learning , 2020, NeurIPS.
[11] Lantao Yu,et al. MOPO: Model-based Offline Policy Optimization , 2020, NeurIPS.
[12] T. Joachims,et al. MOReL : Model-Based Offline Reinforcement Learning , 2020, NeurIPS.
[13] S. Levine,et al. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.
[14] Justin Fu,et al. D4RL: Datasets for Deep Data-Driven Reinforcement Learning , 2020, ArXiv.
[15] Martin A. Riedmiller,et al. Keep Doing What Worked: Behavioral Modelling Priors for Offline Reinforcement Learning , 2020, ICLR.
[16] Martha White,et al. Maxmin Q-learning: Controlling the Estimation Bias of Q-learning , 2020, ICLR.
[17] 知秀 柴田. 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .
[18] Krzysztof Choromanski,et al. Ready Policy One: World Building Through Active Learning , 2020, ICML.
[19] Sergey Levine,et al. Relay Policy Learning: Solving Long-Horizon Tasks via Imitation and Reinforcement Learning , 2019, CoRL.
[20] Sergey Levine,et al. Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning , 2019, ArXiv.
[21] Yifan Wu,et al. Behavior Regularized Offline Reinforcement Learning , 2019, ArXiv.
[22] Rishabh Agarwal,et al. An Optimistic Perspective on Offline Reinforcement Learning , 2019, ICML.
[23] Sergey Levine,et al. When to Trust Your Model: Model-Based Policy Optimization , 2019, NeurIPS.
[24] Sergey Levine,et al. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction , 2019, NeurIPS.
[25] Bo Dai,et al. DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections , 2019, NeurIPS.
[26] Henry Zhu,et al. Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.
[27] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.
[28] Sergey Levine,et al. QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.
[29] Sergey Levine,et al. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.
[30] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[31] Sergey Levine,et al. Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations , 2017, Robotics: Science and Systems.
[32] Nahum Shimkin,et al. Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning , 2016, ICML.
[33] J. Schulman,et al. OpenAI Gym , 2016, ArXiv.
[34] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[35] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[36] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[37] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[38] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[39] F. Schwenker,et al. Neural Network Ensembles in Reinforcement Learning , 2015, Neural Processing Letters.
[40] Yoshua Bengio,et al. How transferable are features in deep neural networks? , 2014, NIPS.
[41] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[42] Stefan Carlsson,et al. CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.
[43] Trevor Darrell,et al. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.
[44] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[45] Martha White,et al. Linear Off-Policy Actor-Critic , 2012, ICML.
[46] Yoshua Bengio,et al. Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.
[47] Peter Secretan. Learning , 1965, Mental Health.
[48] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[49] Martin A. Riedmiller,et al. Batch Reinforcement Learning , 2012, Reinforcement Learning.
[50] Vijay R. Konda,et al. Actor-Critic Algorithms , 1999, NIPS.