Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction
暂无分享,去创建一个
Sergey Levine | Aviral Kumar | Justin Fu | George Tucker | S. Levine | G. Tucker | Aviral Kumar | Justin Fu
[1] Stefan Schaal,et al. Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.
[2] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[3] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[4] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.
[5] Rémi Munos,et al. Error Bounds for Approximate Value Iteration , 2005, AAAI.
[6] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[7] A. Antos,et al. Value-Iteration Based Fitted Policy Iteration: Learning with a Single Trajectory , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[8] James Bennett,et al. The Netflix Prize , 2007 .
[9] Csaba Szepesvári,et al. Fitted Q-iteration in continuous action-space MDPs , 2007, NIPS.
[10] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[11] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[12] Csaba Szepesvári,et al. Error Propagation for Approximate Policy and Value Iteration , 2010, NIPS.
[13] Bernhard Schölkopf,et al. A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..
[14] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[15] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[16] Matthieu Geist,et al. Approximate modified policy iteration and its application to the game of Tetris , 2015, J. Mach. Learn. Res..
[17] Karl Tuyls,et al. The importance of experience replay database composition in deep reinforcement learning , 2015 .
[18] Martha White,et al. Emphatic Temporal-Difference Learning , 2015, ArXiv.
[19] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[20] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[21] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[22] Sergey Levine,et al. Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review , 2018, ArXiv.
[23] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[24] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[25] Trevor Darrell,et al. BDD100K: A Diverse Driving Video Database with Scalable Annotation Tooling , 2018, ArXiv.
[26] Yang Gao,et al. Reinforcement Learning from Imperfect Demonstrations , 2018, ICLR.
[27] Tom Schaul,et al. Deep Q-learning From Demonstrations , 2017, AAAI.
[28] Sergey Levine,et al. QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.
[29] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.
[30] Natasha Jaques,et al. Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog , 2019, ArXiv.
[31] Yifan Wu,et al. Domain Adaptation with Asymmetrically-Relaxed Distribution Alignment , 2019, ICML.
[32] Nomi Ringach,et al. Prioritized Experience Replay via Learnability Approximation , 2019 .
[33] Sergey Levine,et al. Diagnosing Bottlenecks in Deep Q-learning Algorithms , 2019, ICML.
[34] Zachary C. Lipton,et al. What is the Effect of Importance Weighting in Deep Learning? , 2018, ICML.
[35] Jordi Grau-Moya,et al. Soft Q-Learning with Mutual-Information Regularization , 2018, ICLR.
[36] Dale Schuurmans,et al. Striving for Simplicity in Off-policy Deep Reinforcement Learning , 2019, ArXiv.
[37] Marc G. Bellemare,et al. Off-Policy Deep Reinforcement Learning by Bootstrapping the Covariate Shift , 2019, AAAI.
[38] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[39] Safe Policy Improvement with Baseline Bootstrapping , .