暂无分享,去创建一个
Razvan Pascanu | Sergio Gomez Colmenarejo | Matthew W. Hoffman | Nando de Freitas | Yutian Chen | Caglar Gulcehre | Konrad Zolna | Jakub Sygnowski | Sergio G'omez Colmenarejo | Ziyu Wang | Thomas Paine | Matthew Hoffman | T. Paine | Caglar Gulcehre | Ziyun Wang | N. D. Freitas | Razvan Pascanu | Jakub Sygnowski | Yutian Chen | Konrad Zolna
[1] Pieter Abbeel,et al. Towards Characterizing Divergence in Deep Q-Learning , 2019, ArXiv.
[2] Thorsten Joachims,et al. MOReL : Model-Based Offline Reinforcement Learning , 2020, NeurIPS.
[3] Matthew W. Hoffman,et al. Distributed Distributional Deterministic Policy Gradients , 2018, ICLR.
[4] Jiayu Zhou,et al. Ranking Policy Gradient , 2019, ICLR.
[5] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[6] S. Sathiya Keerthi,et al. Efficient algorithms for ranking with SVMs , 2010, Information Retrieval.
[7] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .
[8] Joelle Pineau,et al. Benchmarking Batch Deep Reinforcement Learning Algorithms , 2019, ArXiv.
[9] Sergio Gomez Colmenarejo,et al. RL Unplugged: Benchmarks for Offline Reinforcement Learning , 2020, ArXiv.
[10] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[11] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[12] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[13] Mohammad Norouzi,et al. An Optimistic Perspective on Offline Reinforcement Learning , 2020, ICML.
[14] Sergey Levine,et al. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction , 2019, NeurIPS.
[15] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[16] Tor Lattimore,et al. Behaviour Suite for Reinforcement Learning , 2019, ICLR.
[17] Dale Schuurmans,et al. Striving for Simplicity in Off-policy Deep Reinforcement Learning , 2019, ArXiv.
[18] Lantao Yu,et al. MOPO: Model-based Offline Policy Optimization , 2020, NeurIPS.
[19] S. Levine,et al. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.
[20] Sergio Gomez Colmenarejo,et al. Acme: A Research Framework for Distributed Reinforcement Learning , 2020, ArXiv.
[21] Martin A. Riedmiller,et al. Batch Reinforcement Learning , 2012, Reinforcement Learning.
[22] Dean Pomerleau,et al. ALVINN, an autonomous land vehicle in a neural network , 2015 .
[23] Jürgen Schmidhuber,et al. A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .
[24] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[25] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[26] Rémi Munos,et al. Observe and Look Further: Achieving Consistent Performance on Atari , 2018, ArXiv.
[27] Craig Boutilier,et al. ConQUR: Mitigating Delusional Bias in Deep Q-learning , 2020, ICML.
[28] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[29] Sergey Levine,et al. D4RL: Datasets for Deep Data-Driven Reinforcement Learning , 2020, ArXiv.
[30] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[31] Rémi Munos,et al. Recurrent Experience Replay in Distributed Reinforcement Learning , 2018, ICLR.
[32] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[33] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[34] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[35] Jakub W. Pachocki,et al. Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.
[36] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[37] Gregory N. Hullender,et al. Learning to rank using gradient descent , 2005, ICML.
[38] Jakub W. Pachocki,et al. Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..
[39] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.
[40] S. Levine,et al. Conservative Q-Learning for Offline Reinforcement Learning , 2020, NeurIPS.
[41] Natasha Jaques,et al. Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog , 2019, ArXiv.
[42] Nando de Freitas,et al. Critic Regularized Regression , 2020, NeurIPS.
[43] Seyed Kamyar Seyed Ghasemipour,et al. EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL , 2020, ICML.
[44] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[45] Tie-Yan Liu,et al. Ranking Measures and Loss Functions in Learning to Rank , 2009, NIPS.
[46] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[47] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.