Conservative Q-Learning for Offline Reinforcement Learning
-
爱吃猫的鱼0At June 1, 2022, 11:11 p.m.
[1] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[2] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[3] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[4] Leonid Peshkin,et al. Learning from Scarce Experience , 2002, ICML.
[5] Laurent El Ghaoui,et al. Robustness in Markov Decision Problems with Uncertain Transition Matrices , 2003, NIPS.
[6] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[7] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[8] Garud Iyengar,et al. Robust Dynamic Programming , 2005, Math. Oper. Res..
[9] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[10] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[11] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.
[12] Csaba Szepesvári,et al. Error Propagation for Approximate Policy and Value Iteration , 2010, NIPS.
[13] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.
[14] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[15] Shie Mannor,et al. Scaling Up Robust MDPs using Function Approximation , 2014, ICML.
[16] Bruno Scherrer,et al. Approximate Policy Iteration Schemes: A Comparison , 2014, ICML.
[17] Philip S. Thomas,et al. High Confidence Policy Improvement , 2015, ICML.
[18] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[19] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[20] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[21] Marek Petrik,et al. Safe Policy Improvement by Minimizing Robust Baseline Regret , 2016, NIPS.
[22] Marc G. Bellemare,et al. Increasing the Action Gap: New Operators for Reinforcement Learning , 2015, AAAI.
[23] Martin A. Riedmiller,et al. Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards , 2017, ArXiv.
[24] John C. Duchi,et al. Variance-based Regularization with Convex Objectives , 2016, NIPS.
[25] Pieter Abbeel,et al. Constrained Policy Optimization , 2017, ICML.
[26] Benjamin Van Roy,et al. Why is Posterior Sampling Better than Optimism for Reinforcement Learning? , 2016, ICML.
[27] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[28] Shie Mannor,et al. Consistent On-Line Off-Policy Evaluation , 2017, ICML.
[29] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[30] Mark S. Squillante,et al. A General Family of Robust Stochastic Operators for Reinforcement Learning , 2018, ArXiv.
[31] Sergey Levine,et al. Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations , 2017, Robotics: Science and Systems.
[32] Tom Schaul,et al. Deep Q-learning From Demonstrations , 2017, AAAI.
[33] Qiang Liu,et al. Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation , 2018, NeurIPS.
[34] Sergey Levine,et al. QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.
[35] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[36] Marc G. Bellemare,et al. Distributional Reinforcement Learning with Quantile Regression , 2017, AAAI.
[37] Pieter Abbeel,et al. Towards Characterizing Divergence in Deep Q-Learning , 2019, ArXiv.
[38] Amos J. Storkey,et al. Exploration by Random Network Distillation , 2018, ICLR.
[39] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.
[40] Natasha Jaques,et al. Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog , 2019, ArXiv.
[41] Romain Laroche,et al. Safe Policy Improvement with Baseline Bootstrapping , 2017, ICML.
[42] Sergey Levine,et al. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction , 2019, NeurIPS.
[43] Bo Dai,et al. DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections , 2019, NeurIPS.
[44] Sergey Levine,et al. Relay Policy Learning: Solving Long-Horizon Tasks via Imitation and Reinforcement Learning , 2019, CoRL.
[45] Yifan Wu,et al. Behavior Regularized Offline Reinforcement Learning , 2019, ArXiv.
[46] Sergey Levine,et al. Diagnosing Bottlenecks in Deep Q-learning Algorithms , 2019, ICML.
[47] Sergey Levine,et al. Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning , 2019, ArXiv.
[48] Nan Jiang,et al. Information-Theoretic Considerations in Batch Reinforcement Learning , 2019, ICML.
[49] Marc G. Bellemare,et al. Off-Policy Deep Reinforcement Learning by Bootstrapping the Covariate Shift , 2019, AAAI.
[50] Romain Laroche,et al. Safe Policy Improvement with Soft Baseline Bootstrapping , 2019, ECML/PKDD.
[51] S. Levine,et al. Accelerating Online Reinforcement Learning with Offline Datasets , 2020, ArXiv.
[52] Martin A. Riedmiller,et al. Keep Doing What Worked: Behavioral Modelling Priors for Offline Reinforcement Learning , 2020, ICLR.
[53] Nan Jiang,et al. $Q^\star$ Approximation Schemes for Batch Reinforcement Learning: A Theoretical Comparison , 2020, 2003.03924.
[54] Bo Dai,et al. GenDICE: Generalized Offline Estimation of Stationary Values , 2020, ICLR.
[55] Rishabh Agarwal,et al. An Optimistic Perspective on Offline Reinforcement Learning , 2019, ICML.
[56] 知秀 柴田. 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .
[57] S. Levine,et al. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.
[58] Romain Laroche,et al. Safe Policy Improvement with an Estimated Baseline Policy , 2019, AAMAS.
[59] Justin Fu,et al. D4RL: Datasets for Deep Data-Driven Reinforcement Learning , 2020, ArXiv.
[60] Sergey Levine,et al. DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction , 2020, NeurIPS.
[61] Tengyu Ma,et al. Learning Self-Correctable Policies and Value Functions from Demonstrations with Negative Sampling , 2019, ICLR.
[62] Brendan O'Donoghue,et al. Variational Bayesian Reinforcement Learning with Regret Bounds , 2018, NeurIPS.