Mastering the game of Go without human knowledge
暂无分享,去创建一个
Demis Hassabis | Karen Simonyan | Fan Hui | David Silver | Matthew Lai | Thore Graepel | Thomas Hubert | Julian Schrittwieser | Arthur Guez | Laurent Sifre | Timothy Lillicrap | Aja Huang | George van den Driessche | ioannis Antonoglou | Lucas baker | Adrian bolton | Yutian chen | Aja Huang | L. Sifre | T. Lillicrap | D. Hassabis | D. Silver | A. Guez | Ioannis Antonoglou | T. Graepel | K. Simonyan | T. Hubert | Julian Schrittwieser | Matthew Lai | Yutian Chen | Lucas baker | A. Bolton | Fan Hui | David Silver | Adrian Bolton
[1] R. Bellman,et al. Dynamic Programming and Markov Processes , 1960 .
[2] A. L. Samuel,et al. Some studies in machine learning using the game of checkers. II: recent progress , 1967 .
[3] Gerald Tesauro,et al. Neurogammon: a neural-network backgammon program , 1990, 1990 IJCNN International Joint Conference on Neural Networks.
[4] Andrew G. Barto,et al. Monte Carlo Matrix Inversion and Reinforcement Learning , 1993, NIPS.
[5] Terrence J. Sejnowski,et al. Temporal Difference Learning of Position Evaluation in the Game of Go , 1993, NIPS.
[6] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[7] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[8] Wei Zhang,et al. A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.
[9] Gerald Tesauro,et al. On-line Policy Improvement using Monte-Carlo Search , 1996, NIPS.
[10] M. Enzenberger. The Integration of A Priori Knowledge into a Go Playing Neural Network , 1996 .
[11] Yoshua Bengio,et al. Convolutional networks for images, speech, and time series , 1998 .
[12] Michael Buro,et al. From Simple Features to Sophisticated Evaluation Functions , 1998, Computers and Games.
[13] Richard Hans Robert Hahnloser,et al. Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit , 2000, Nature.
[14] Jonathan Schaeffer,et al. Temporal Difference Learning Applied to a High-Performance Game-Playing Program , 2001, IJCAI.
[15] Martin Müller,et al. Computer Go , 2002, Artif. Intell..
[16] Haixun Wang,et al. Empirical comparison of various reinforcement learning strategies for sequential targeted marketing , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..
[17] Brian Sheppard,et al. World-championship-caliber Scrabble , 2002, Artif. Intell..
[18] Markus Enzenberger,et al. Evaluation in Go by a Neural Network using Soft Segmentation , 2003, ACG.
[19] Michail G. Lagoudakis,et al. Reinforcement Learning as Classification: Leveraging Modern Classifiers , 2003, ICML.
[20] Richard S. Sutton,et al. Reinforcement learning with replacing eligibility traces , 2004, Machine Learning.
[21] Kunihiko Fukushima,et al. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.
[22] Andrew Tridgell,et al. Learning to Play Chess Using Temporal Differences , 2000, Machine Learning.
[23] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[24] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[25] Olivier Teytaud,et al. Modification of UCT with Patterns in Monte-Carlo Go , 2006 .
[26] Rémi Coulom,et al. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.
[27] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[28] Jacek Mandziuk,et al. Computational Intelligence in Mind Games , 2007, Challenges for Computational Intelligence.
[29] Rémi Coulom,et al. Computing "Elo Ratings" of Move Patterns in the Game of Go , 2007, J. Int. Comput. Games Assoc..
[30] Rémi Coulom,et al. Whole-History Rating: A Bayesian Rating System for Players of Time-Varying Strength , 2008, Computers and Games.
[31] Joel Veness,et al. Bootstrapping from Game Tree Search , 2009, NIPS.
[32] Flavien Balbo,et al. Using a monte-carlo approach for bus regulation , 2009, 2009 12th International IEEE Conference on Intelligent Transportation Systems.
[33] David Silver,et al. Reinforcement learning and simulation-based search in computer go , 2009 .
[34] Richard B. Segal,et al. On the Scalability of Parallel UCT , 2010, Computers and Games.
[35] Ashutosh Kumar Singh,et al. The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .
[36] N. Le Fort-Piat,et al. The world of independent learners is not markovian , 2011, Int. J. Knowl. Based Intell. Eng. Syst..
[37] Christopher D. Rosin,et al. Multi-armed bandits with episode context , 2011, Annals of Mathematics and Artificial Intelligence.
[38] D. Bertsekas. Approximate policy iteration: a survey and some new methods , 2011 .
[39] David Silver,et al. Monte-Carlo tree search and rapid action value estimation in computer Go , 2011, Artif. Intell..
[40] Richard S. Sutton,et al. Temporal-difference search in computer Go , 2012, Machine Learning.
[41] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[42] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.
[43] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..
[44] David Silver,et al. Concurrent Reinforcement Learning from Customer Interactions , 2013, ICML.
[45] Honglak Lee,et al. Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning , 2014, NIPS.
[46] Bruno Scherrer,et al. Approximate Policy Iteration Schemes: A Comparison , 2014, ICML.
[47] David Silver,et al. Move Evaluation in Go Using Deep Convolutional Neural Networks , 2014, ICLR.
[48] Matthew Lai,et al. Giraffe: Using Deep Reinforcement Learning to Play Chess , 2015, ArXiv.
[49] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[50] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.
[51] Matthieu Geist,et al. Approximate modified policy iteration and its application to the game of Tetris , 2015, J. Mach. Learn. Res..
[52] Amos J. Storkey,et al. Training Deep Convolutional Neural Networks to Play Go , 2015, ICML.
[53] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[54] Yuandong Tian,et al. Better Computer Go Player with Neural Network and Long-term Prediction , 2016, ICLR.
[55] Nando de Freitas,et al. Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.
[56] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[57] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[58] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[59] David Silver,et al. Deep Reinforcement Learning from Self-Play in Imperfect-Information Games , 2016, ArXiv.
[60] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[61] Shimon Whiteson,et al. Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning , 2017, ICML.
[62] Vladlen Koltun,et al. Learning to Act by Predicting the Future , 2016, ICLR.
[63] Kevin Waugh,et al. DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.
[64] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.
[65] Tristan Cazenave. Residual Networks for Computer Go , 2018, IEEE Transactions on Games.