暂无分享,去创建一个
David Silver | Julian Schrittwieser | Ioannis Antonoglou | Mohammadamin Barekatain | Thomas Hubert | Amol Mandhane | Ioannis Antonoglou | T. Hubert | Julian Schrittwieser | M. Barekatain | David Silver | Amol Mandhane
[1] Frank Hutter,et al. Fixing Weight Decay Regularization in Adam , 2017, ArXiv.
[2] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[3] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.
[4] Ruben Villegas,et al. Learning Latent Dynamics for Planning from Pixels , 2018, ICML.
[5] Demis Hassabis,et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.
[6] Martin A. Riedmiller,et al. Keep Doing What Worked: Behavioral Modelling Priors for Offline Reinforcement Learning , 2020, ICLR.
[7] Nando de Freitas,et al. Critic Regularized Regression , 2020, NeurIPS.
[8] Yutaka Matsuo,et al. Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization , 2020, ICLR.
[9] Demis Hassabis,et al. Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.
[10] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.
[11] Thorsten Joachims,et al. MOReL : Model-Based Offline Reinforcement Learning , 2020, NeurIPS.
[12] Qiang He,et al. POPO: Pessimistic Offline Policy Optimization , 2020, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[13] Marcin Andrychowicz,et al. Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.
[14] Alec Radford,et al. Scaling Laws for Neural Language Models , 2020, ArXiv.
[15] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[16] Karen Simonyan,et al. Off-Policy Actor-Critic with Shared Experience Replay , 2020, ICML.
[17] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.
[18] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[19] Gabriel Dulac-Arnold,et al. Model-Based Offline Planning , 2020, ArXiv.
[20] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.
[21] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.
[22] Lantao Yu,et al. MOPO: Model-based Offline Policy Optimization , 2020, NeurIPS.
[23] S. Levine,et al. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.
[24] S. Levine,et al. Conservative Q-Learning for Offline Reinforcement Learning , 2020, NeurIPS.
[25] Marc G. Bellemare,et al. Distributional Reinforcement Learning with Quantile Regression , 2017, AAAI.
[26] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[27] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[28] Mohammad Norouzi,et al. An Optimistic Perspective on Offline Reinforcement Learning , 2020, ICML.
[29] Sergio Gomez Colmenarejo,et al. RL Unplugged: Benchmarks for Offline Reinforcement Learning , 2020, ArXiv.
[30] Richard S. Sutton,et al. Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.
[31] Yifan Wu,et al. Behavior Regularized Offline Reinforcement Learning , 2019, ArXiv.
[32] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[33] Alan Fern,et al. DeepAveragers: Offline Reinforcement Learning by Solving Derived Non-Parametric MDPs , 2020, ICLR.
[34] David Silver,et al. Learning and Planning in Complex Action Spaces , 2021, ICML.
[35] Marlos C. Machado,et al. Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..
[36] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.
[37] Chelsea Finn,et al. Offline Reinforcement Learning from Images with Latent Space Models , 2020, L4DC.
[38] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.