暂无分享,去创建一个
[1] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[2] Sergey Levine,et al. Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning , 2020, ICLR.
[3] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.
[4] S. Levine,et al. Accelerating Online Reinforcement Learning with Offline Datasets , 2020, ArXiv.
[5] Marc G. Bellemare,et al. The Importance of Pessimism in Fixed-Dataset Policy Optimization , 2020, ArXiv.
[6] S. Levine,et al. Conservative Q-Learning for Offline Reinforcement Learning , 2020, NeurIPS.
[7] Nitish Srivastava,et al. Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning , 2021, ICML.
[8] Marc G. Bellemare,et al. Distributional Reinforcement Learning with Quantile Regression , 2017, AAAI.
[9] Marc G. Bellemare,et al. Dopamine: A Research Framework for Deep Reinforcement Learning , 2018, ArXiv.
[10] Thorsten Joachims,et al. MOReL : Model-Based Offline Reinforcement Learning , 2020, NeurIPS.
[11] Mohammad Norouzi,et al. An Optimistic Perspective on Offline Reinforcement Learning , 2020, ICML.
[12] Pieter Abbeel,et al. Towards Characterizing Divergence in Deep Q-Learning , 2019, ArXiv.
[13] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[14] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[15] Nando de Freitas,et al. Critic Regularized Regression , 2020, NeurIPS.
[16] Marlos C. Machado,et al. Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..
[17] Ruosong Wang,et al. What are the Statistical Limits of Offline RL with Linear Function Approximation? , 2020, ICLR.
[18] Tian Tian,et al. MinAtar: An Atari-Inspired Testbed for Thorough and Reproducible Reinforcement Learning Experiments , 2019 .
[19] Philip S. Thomas,et al. High Confidence Policy Improvement , 2015, ICML.
[20] Ruosong Wang,et al. Instabilities of Offline RL with Pre-Trained Neural Representation , 2021, ICML.
[21] David Silver,et al. Online and Offline Reinforcement Learning by Planning with a Learned Model , 2021, NeurIPS.
[22] Lantao Yu,et al. MOPO: Model-based Offline Policy Optimization , 2020, NeurIPS.
[23] R. Held,et al. MOVEMENT-PRODUCED STIMULATION IN THE DEVELOPMENT OF VISUALLY GUIDED BEHAVIOR. , 1963, Journal of comparative and physiological psychology.
[24] S. Levine,et al. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.
[25] Yifan Wu,et al. Behavior Regularized Offline Reinforcement Learning , 2019, ArXiv.
[26] Joelle Pineau,et al. Benchmarking Batch Deep Reinforcement Learning Algorithms , 2019, ArXiv.
[27] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[28] Emma Brunskill,et al. Provably Good Batch Reinforcement Learning Without Great Exploration , 2020, ArXiv.
[29] Joelle Pineau,et al. Interference and Generalization in Temporal Difference Learning , 2020, ICML.
[30] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[31] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[32] Hossein Mobahi,et al. The Low-Rank Simplicity Bias in Deep Networks , 2021, ArXiv.
[33] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[34] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.
[35] Yutaka Matsuo,et al. Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization , 2020, ICLR.
[36] Rémi Munos,et al. Recurrent Experience Replay in Distributed Reinforcement Learning , 2018, ICLR.
[37] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[38] Alexander J. Smola,et al. Continuous Doubly Constrained Batch Reinforcement Learning , 2021, NeurIPS.
[39] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[40] Sergey Levine,et al. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction , 2019, NeurIPS.
[41] Bo Dai,et al. On the Sample Complexity of Batch Reinforcement Learning with Policy-Induced Data , 2021, ArXiv.
[42] Robin S Spruce,et al. "Learning to be a Learner" , 2021 .
[43] Sergey Levine,et al. DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction , 2020, NeurIPS.
[44] Pablo Samuel Castro,et al. Revisiting Rainbow: Promoting more insightful and inclusive deep reinforcement learning research , 2021, ICML.
[45] Matteo Hessel,et al. Deep Reinforcement Learning and the Deadly Triad , 2018, ArXiv.
[46] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..