暂无分享,去创建一个
Tim Salimans | Thomas Anthony | Robert Nishihara | Philipp Moritz | John Schulman | J. Schulman | Philipp Moritz | Thomas W. Anthony | Robert Nishihara | Tim Salimans
[1] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[2] Gerald Tesauro,et al. On-line Policy Improvement using Monte-Carlo Search , 1996, NIPS.
[3] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[4] Michail G. Lagoudakis,et al. Reinforcement Learning as Classification: Leveraging Modern Classifiers , 2003, ICML.
[5] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[6] Rémi Coulom,et al. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.
[7] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[8] David Silver,et al. Reinforcement learning and simulation-based search in computer go , 2009 .
[9] Ryan B. Hayward,et al. Monte Carlo Tree Search in Hex , 2010, IEEE Transactions on Computational Intelligence and AI in Games.
[10] Richard B. Segal,et al. On the Scalability of Parallel UCT , 2010, Computers and Games.
[11] Nataliya Sokolovska,et al. Continuous Upper Confidence Trees , 2011, LION.
[12] Christopher D. Rosin,et al. Multi-armed bandits with episode context , 2011, Annals of Mathematics and Artificial Intelligence.
[13] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.
[14] Jakub Pawlewicz,et al. Scalable Parallel DFPN Search , 2013, Computers and Games.
[15] Shih-Chieh Huang,et al. MoHex 2.0: A Pattern-Based MCTS Hex Player , 2013, Computers and Games.
[16] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[17] Marco Platzner,et al. Adaptive Playouts in Monte-Carlo Tree Search with Policy-Gradient Reinforcement Learning , 2015, ACG.
[18] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[19] Matthieu Geist,et al. Approximate modified policy iteration and its application to the game of Tetris , 2015, J. Mach. Learn. Res..
[20] Kenny Young,et al. Neurohex: A Deep Q-learning Hex Agent , 2016, CGW@IJCAI.
[21] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[22] Masahito Yamamoto,et al. Reinforcement Learning for Creating Evaluation Function Using Convolutional Neural Network in Hex , 2017, 2017 Conference on Technologies and Applications of Artificial Intelligence (TAAI).
[23] Demis Hassabis,et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.
[24] David Barber,et al. Thinking Fast and Slow with Deep Learning and Tree Search , 2017, NIPS.
[25] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[26] Yee Whye Teh,et al. Distral: Robust multitask reinforcement learning , 2017, NIPS.
[27] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[28] Chao Gao,et al. Adversarial Policy Gradient for Alternating Markov Games , 2018, ICLR.
[29] Martin Müller,et al. Move Prediction Using Deep Convolutional Neural Networks in Hex , 2018, IEEE Transactions on Games.
[30] Sergey Levine,et al. Divide-and-Conquer Reinforcement Learning , 2017, ICLR.
[31] Michael I. Jordan,et al. Ray: A Distributed Framework for Emerging AI Applications , 2017, OSDI.
[32] Demis Hassabis,et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.