暂无分享,去创建一个
David Silver | Julian Schrittwieser | Ioannis Antonoglou | Mohammadamin Barekatain | Simon Schmitt | Thomas Hubert | Ioannis Antonoglou | T. Hubert | Julian Schrittwieser | Simon Schmitt | M. Barekatain | David Silver
[1] Sergey Levine,et al. Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning , 2019, ArXiv.
[2] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[3] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[4] Nicolas Le Roux,et al. An operator view of policy gradient methods , 2020, NeurIPS.
[5] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.
[6] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[7] Alan Fern,et al. Dream and Search to Control: Latent Space Planning for Continuous Control , 2020, ArXiv.
[8] Nir Levine,et al. An empirical investigation of the challenges of real-world reinforcement learning , 2020, ArXiv.
[9] Demis Hassabis,et al. Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.
[10] John Rust. Using Randomization to Break the Curse of Dimensionality , 1997 .
[11] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[12] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[13] H. Jaap van den Herik,et al. Progressive Strategies for Monte-Carlo Tree Search , 2008 .
[14] Yunhao Tang,et al. Discretizing Continuous Action Space for On-Policy Optimization , 2019, AAAI.
[15] Ruben Villegas,et al. Learning Latent Dynamics for Planning from Pixels , 2018, ICML.
[16] Frank Hutter,et al. Fixing Weight Decay Regularization in Adam , 2017, ArXiv.
[17] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[18] Mohammad Norouzi,et al. Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.
[19] Rémi Coulom,et al. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.
[20] Yoshua Bengio,et al. Probabilistic Planning with Sequential Monte Carlo methods , 2018, ICLR.
[21] Michal Valko,et al. Monte-Carlo Tree Search as Regularized Policy Optimization , 2020, ICML.
[22] D. Rubin,et al. The calculation of posterior distributions by data augmentation , 1987 .
[23] Junhyuk Oh,et al. A Self-Tuning Actor-Critic Algorithm , 2020, NeurIPS.
[24] Demis Hassabis,et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.
[25] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[26] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[27] Jackie Kay,et al. Local Search for Policy Iteration in Continuous Control , 2020, ArXiv.
[28] Noam Brown,et al. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.
[29] Byron Boots,et al. Information Theoretic Model Predictive Q-Learning , 2020, L4DC.
[30] Martin A. Riedmiller,et al. Imagined Value Gradients: Model-Based Policy Optimization with Transferable Latent Dynamics Models , 2019, CoRL.
[31] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[32] Kevin Waugh,et al. DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.
[33] Nicolas Heess,et al. Hierarchical visuomotor control of humanoids , 2018, ICLR.
[34] Peng Wei,et al. Continuous Control for Searching and Planning with a Learned Model , 2020, ArXiv.
[35] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[36] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[37] H. Francis Song,et al. V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control , 2019, ICLR.
[38] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.
[39] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[40] Richard Evans,et al. Deep Reinforcement Learning in Large Discrete Action Spaces , 2015, 1512.07679.
[41] Catholijn M. Jonker,et al. A0C: Alpha Zero in Continuous Action Space , 2018, ArXiv.
[42] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.