论文信息 - Learning to Play: Reinforcement Learning and Games

Learning to Play: Reinforcement Learning and Games

principled frameworks such as minimax, reinforcement learning, or function approximation. In addition to the elegant conceptual frameworks, deep, dirty, domain-specific understanding is necessary for progress in this field [594]. 8.1.4 Towards General Intelligence Let us revisit the problem statement from the Introduction. What are the machine learning methods that are used in Chess and Go to achieve a level of play stronger than the strongest humans? Various reinforcement learning methods have beaten human champions in Chess and Go, ranging from heuristic planning, adaptive sampling, function approximation, to self-play. All methods fit in the classic search-eval architecture, and all use variants of the principles of bootstrapping and optimization. Heuristic planning can achieve levels of play far surpassing human ability in Chess and Checkers. We have also seen how to combine planning and learning into a self-play system that learns to play from scratch, achieving even higher levels in Chess, Shogi, and Go. Did we achieve intelligence? For these three games, we certainly passed the Turing test.1 Systems were created that behave at a level for which a human would need super human intelligence. The intelligence is, however, single-domain intelligence. AlphaGo cannot play Shogi, and it cannot think of an entertaining joke. Joining the two fields of symbolic AI and connectionist AI, AlphaZero showed that the methods generalize to three games, hinting at artificial general intelligence. However, it is more precise to speak of training three almost identical systems to become three different systems specialized in their own kind of special intelligence (since the net, once trained for Go, cannot play Chess). (GGP systems can play general games, but do not achieve performance close to AlphaZero; Sect. 8.2.5.) 8.2 Future Drosophilas The reinforcement learning results in Atari andGo have inspiredmuch further research. The use of concrete games, the Drosophilas, has stimulated progress in game playing greatly. In addition to the usual two-person zero-sum perfect-information games, researchers have looked for games that capture other elements of real life, such as non-zero-sum (collaboration) games, imperfect information, and multi-agent games. In this section we will review some of these aspects. 1 Actually, since we far surpassed human play, an argument can be made that in doing so we failed the Turing test, since no human can play at this level.

Aske Plaat | A. Plaat

[1] H. P.,et al. Mathematical Recreations , 1944, Nature.

[2] E. Rowland. Theory of Games and Economic Behavior , 1946, Nature.

[3] Claude E. Shannon,et al. Programming a computer for playing chess , 1950 .

[4] C. S. Strachey,et al. Logical or non-mathematical programmes , 1952, ACM '52.

[5] Allen Newell,et al. Elements of a theory of human problem solving. , 1958 .

[6] F ROSENBLATT,et al. The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[7] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[8] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .

[9] D. Hubel,et al. Shape and arrangement of columns in cat's striate cortex , 1963, The Journal of physiology.

[10] Daniel Edwards,et al. The Alpha-Beta Heuristic , 1963 .

[11] R Bellman. ON THE APPLICATION OF DYNAMIC PROGRAMING TO THE DETERMINATION OF OPTIMAL PLAY IN CHESS AND CHECKERS. , 1965, Proceedings of the National Academy of Sciences of the United States of America.

[12] D. Michie. GAME-PLAYING AND GAME-LEARNING AUTOMATA , 1966 .

[13] Joseph Weizenbaum,et al. ELIZA—a computer program for the study of natural language communication between man and machine , 1966, CACM.

[14] Donald E. Eastlake,et al. The Greenblatt chess program , 1967, AFIPS '67 (Fall).

[15] D. Hubel,et al. Receptive fields and functional architecture of monkey striate cortex , 1968, The Journal of physiology.

[16] Barbara J Huberman,et al. A program to play chess end games , 1968 .

[17] Morton D. Davis. Game Theory: A Nontechnical Introduction , 1970 .

[18] Richard Fikes,et al. STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[19] Richard Fikes,et al. Learning and Executing Generalized Robot Plans , 1993, Artif. Intell..

[20] Donald E. Knuth,et al. The art of computer programming: sorting and searching (volume 3) , 1973 .

[21] P. Werbos,et al. Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[22] John Holland,et al. Adaptation in Natural and Artificial Sys-tems: An Introductory Analysis with Applications to Biology , 1975 .

[23] Donald E. Knuth,et al. The Solution for the Branching Factor of the Alpha-Beta Pruning Algorithm , 1981, ICALP.

[24] David B. Benson,et al. Life in the game of Go , 1976 .

[25] I. Witten. The apparent conflict between estimation and control—a survey of the two-armed bandit problem , 1976 .

[26] Hans J. Berliner,et al. Experiences in Evaluation with BKG - A Program that Plays Backgammon , 1977, IJCAI.

[27] Donald W. Loveland,et al. Automated theorem proving: a logical basis , 1978, Fundamental studies in computer science.

[28] A. Elo. The rating of chessplayers, past and present , 1978 .

[29] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..

[30] George C. Stockman,et al. A Minimax Algorithm Better than Alpha-Beta? , 1979, Artif. Intell..

[31] Larry S. Davis,et al. Pattern Databases , 1979, Data Base Design Techniques II.

[32] Judea Pearl,et al. SCOUT: A Simple Game-Searching Algorithm with Proven Optimal Properties , 1980, AAAI.

[33] Dana S. Nau. Pathology on Game Trees: A Summary of Results , 1980, AAAI.

[34] A. M. Turing,et al. Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.

[35] Hans J. Berliner,et al. Backgammon Computer Program Beats World Champion , 1980 .

[36] Aviezri S. Fraenkel,et al. Computing a Perfect Strategy for n*n Chess Requires Time Exponential in N , 1981, ICALP.

[37] T. Nitsche,et al. A LEARNING CHESS PROGRAM , 1982 .

[38] J J Hopfield,et al. Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[39] K. Coplan. A SPECIAL-PURPOSE MACHINE FOR AN IMPROVED SEARCH ALGORITHM FOR DEEP CHESS COMBINATIONS , 1982 .

[40] Bruno Buchberger,et al. Computer algebra symbolic and algebraic computation , 1982, SIGS.

[41] Judea Pearl,et al. On the Nature of Pathology in Game Searching , 1983, Artif. Intell..

[42] Bruce W. Ballard,et al. Non-Minimax Search Strategies for Use Against Fallible Opponents , 1983, AAAI.

[43] J. Ross Quinlan,et al. Learning Efficient Classification Procedures and Their Application to Chess End Games , 1983 .

[44] Bruce W. Ballard,et al. The *-Minimax Search Procedure for Trees Containing Chance Nodes , 1983, Artif. Intell..

[45] Donald F. Beal. Recent progress in understanding minimax search , 1983, ACM '83.

[46] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .

[47] Judea Pearl,et al. Heuristics : intelligent search strategies for computer problem solving , 1984 .

[48] John Philip Fishburn. Analysis of speedup in distributed algorithms , 1984 .

[49] Larry Wos,et al. Automated Reasoning: Introduction and Applications , 1984 .

[50] Lawrence J. Henschen,et al. What Is Automated Theorem Proving? , 1985, J. Autom. Reason..

[51] Gerald J. Sussman,et al. Structure and interpretation of computer programs , 1985, Proceedings of the IEEE.

[52] Ken Thompson,et al. Retrograde Analysis of Certain Endgames , 1986, J. Int. Comput. Games Assoc..

[53] G Schrüfer,et al. Presence and absence of pathology on game trees , 1986 .

[54] Rina Dechter,et al. Learning While Searching in Constraint-Satisfaction-Problems , 1986, AAAI.

[55] R. Geoff Dromey,et al. An algorithm for the selection problem , 1986, Softw. Pract. Exp..

[56] Jonathan Schaeffer,et al. Experiments in Search and Knowledge , 1986, J. Int. Comput. Games Assoc..

[57] Edward Hordern,et al. Sliding Piece Puzzles , 1987 .

[58] Ronald L. Rivest,et al. Game Tree Searching by Min/Max Approximation , 1987, Artif. Intell..

[59] Allen Newell,et al. SOAR: An Architecture for General Intelligence , 1987, Artif. Intell..

[60] Michael N. Katehakis,et al. The Multi-Armed Bandit Problem: Decomposition and Computation , 1987, Math. Oper. Res..

[61] Ingo Althöfer,et al. Root Evaluation Errors: How they Arise and Propagate , 1988, J. Int. Comput. Games Assoc..

[62] Sarit Kraus,et al. Diplomat, an agent in a multi agent environment: An overview , 1988, Seventh Annual International Phoenix Conference on Computers an Communications. 1988 Conference Proceedings.

[63] Donald Michie,et al. Machine Learning in the Next Five Years , 1988, EWSL.

[64] Hermann Kaindl,et al. Minimaxing: Theory and Practice , 1988, AI Mag..

[65] David A. McAllester. Conspiracy Numbers for Min-Max Search , 1988, Artif. Intell..

[66] Jack J. Dongarra,et al. An extended set of FORTRAN basic linear algebra subprograms , 1988, TOMS.

[67] D. Nau,et al. Comparison of the minimax and product back-up rules in a variety of games , 1988 .

[68] Dana S. Nau,et al. A general branch-and-bound formulation for and/or graph and game tree search , 1988 .

[69] Gerald Tesauro,et al. Neurogammon Wins Computer Olympiad , 1989, Neural Computation.

[70] Lawrence D. Jackel,et al. Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[71] Donald L. Iglehart,et al. Importance sampling for stochastic simulations , 1989 .

[72] Jonathan Schaeffer,et al. The History Heuristic and Alpha-Beta Search Enhancements in Practice , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[73] W S McCulloch,et al. A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.

[74] Murray Campbell,et al. Singular Extensions: Adding Selectivity to Brute-Force Searching , 1990, Artif. Intell..

[75] Hiroaki Kitano,et al. Designing Neural Networks Using Genetic Algorithms with Graph Generation System , 1990, Complex Syst..

[76] Craig A. Knoblock. Learning Abstraction Hierarchies for Problem Solving , 1990, AAAI.

[77] Donald F. Beal,et al. A Generalised Quiescence Search Algorithm , 1990, Artif. Intell..

[78] Wim Pijls,et al. Another View on the SSS* Algorithm , 1990, SIGAL International Symposium on Algorithms.

[79] Gerald Tesauro,et al. Neurogammon: a neural-network backgammon program , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[80] Albert L. Zobrist,et al. A New Hashing Method with Application for Game Playing , 1990 .

[81] Murray Campbell,et al. Experiments with the Null-Move Heuristic , 1990 .

[82] Ken Chen,et al. Smart game board and go explorer: a study in software and knowledge engineering , 1990, Commun. ACM.

[83] F. Hsu,et al. A Grandmaster Chess Machine , 1990 .

[84] Bruce Abramson,et al. Expected-Outcome: A General Model of Static Evaluation , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[85] R. A. Brooks,et al. Intelligence without Representation , 1991, Artif. Intell..

[86] Herbert D. Enderton. The Golem Go Program , 1991 .

[87] Dap Hartmann,et al. How Computers Play Chess , 1991, J. Int. Comput. Games Assoc..

[88] Austin Tate,et al. O-Plan: The open Planning Architecture , 1991, Artif. Intell..

[89] Yoshua Bengio,et al. Learning a synaptic learning rule , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[90] Sarit Kraus,et al. Negotiation in a non-cooperative environment , 1991, J. Exp. Theor. Artif. Intell..

[91] George Cybenko,et al. Approximation by superpositions of a sigmoidal function , 1992, Math. Control. Signals Syst..

[92] Bernhard E. Boser,et al. A training algorithm for optimal margin classifiers , 1992, COLT '92.

[93] Jürgen Schmidhuber,et al. Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.

[94] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .

[95] Jean-Christophe Weill. The NegaC* Search , 1992, J. Int. Comput. Games Assoc..

[96] Stuart C. Shapiro. The Turing Test and the economist , 1992, SGAR.

[97] Satinder P. Singh,et al. Reinforcement Learning with a Hierarchy of Abstract Models , 1992, AAAI.

[98] Lorien Y. Pratt,et al. Discriminability-Based Transfer between Neural Networks , 1992, NIPS.

[99] Jonathan Schaeffer,et al. A World Championship Caliber Checkers Program , 1992, Artif. Intell..

[100] Jaap van den Herik,et al. Heuristic programming in Artificial Intelligence 3: the third computer olympiad , 1992 .

[101] Robert Lake,et al. Solving Large Retrograde Analysis Problems Using a Network of Workstations , 1993 .

[102] B. Rost,et al. Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[103] Terrence J. Sejnowski,et al. Temporal Difference Learning of Position Evaluation in the Game of Go , 1993, NIPS.

[104] Christian Donninger,et al. Null Move and Deep Search , 1993, J. Int. Comput. Games Assoc..

[105] J. Elman. Learning and development in neural networks: the importance of starting small , 1993, Cognition.

[106] Heekuck Oh,et al. Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[107] Thomas Bäck,et al. An Overview of Evolutionary Algorithms for Parameter Optimization , 1993, Evolutionary Computation.

[108] Leslie Pack Kaelbling,et al. Hierarchical Learning in Stochastic Domains: Preliminary Results , 1993, ICML.

[109] S. Hyakin,et al. Neural Networks: A Comprehensive Foundation , 1994 .

[110] L. V. Allis,et al. Searching for solutions in games and artificial intelligence , 1994 .

[111] Sebastian Thrun,et al. Learning to Play the Game of Chess , 1994, NIPS.

[112] H. Jaap van den Herik,et al. Proof-Number Search , 1994, Artif. Intell..

[113] David B. Fogel,et al. An introduction to simulated evolutionary optimization , 1994, IEEE Trans. Neural Networks.

[114] Elwyn R. Berlekamp,et al. Mathematical Go - chilling gets the last point , 1994 .

[115] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[116] Shigeki Iwata,et al. The Othello game on an n*n board is PSPACE-complete , 1994, Theor. Comput. Sci..

[117] Aske Plaat,et al. Solution Trees as a Basis for Game-Tree Search , 1994, J. Int. Comput. Games Assoc..

[118] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[119] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.

[120] Gerald Tesauro,et al. TD-Gammon: A Self-Teaching Backgammon Program , 1995 .

[121] James L. McClelland,et al. Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. , 1995, Psychological review.

[122] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .

[123] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..

[124] Sebastian Thrun,et al. Explanation-based neural network learning a lifelong learning approach , 1995 .

[125] Dan Boneh,et al. On genetic algorithms , 1995, COLT '95.

[126] Luca Maria Gambardella,et al. Ant-Q: A Reinforcement Learning Approach to the Traveling Salesman Problem , 1995, ICML.

[127] S. Yakowitz,et al. Machine learning and nonparametric bandit theory , 1995, IEEE Trans. Autom. Control..

[128] Jonathan Schaeffer,et al. CHINOOK: The World Man-Machine Checkers Champion , 1996, AI Mag..

[129] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[130] B. Pell. A STRATEGIC METAGAME PLAYER FOR GENERAL CHESS‐LIKE GAMES , 1994, Comput. Intell..

[131] M. Buro. Statistical Feature Combination for the Evaluation of Game Positions , 1995, J. Int. Comput. Games Assoc..

[132] Johannes Fürnkranz,et al. Machine Learning in Computer Chess: The Next Generation , 1996, J. Int. Comput. Games Assoc..

[133] Jonathan Schaeffer,et al. Best-First Fixed-Depth Minimax Algorithms , 1996, J. Int. Comput. Games Assoc..

[134] Jonathan Schaeffer,et al. New advances in Alpha-Beta searching , 1996, CSC '96.

[135] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[136] Jonathan Schaeffer,et al. Exploiting Graph Properties of Game Trees , 1996, AAAI/IAAI, Vol. 1.

[137] Jordan B. Pollack,et al. Why did TD-Gammon Work? , 1996, NIPS.

[138] M. Enzenberger. The Integration of A Priori Knowledge into a Go Playing Neural Network , 1996 .

[139] Jaeyoung Choi,et al. PB-BLAS: a set of parallel block basic linear algebra subprograms , 1996, Concurr. Pract. Exp..

[140] Richard E. Korf,et al. Finding Optimal Solutions to the Twenty-Four Puzzle , 1996, AAAI/IAAI, Vol. 2.

[141] Thomas Bäck,et al. Evolutionary algorithms in theory and practice - evolution strategies, evolutionary programming, genetic algorithms , 1996 .

[142] Ralph Gasser,et al. SOLVING NINE MEN'S MORRIS , 1996, Comput. Intell..

[143] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[144] Aske Plaat,et al. Research, Re: Search and Re-Search , 1996, J. Int. Comput. Games Assoc..

[145] Aske Plaat,et al. Programming Parallel Applications In Cilk , 1997 .

[146] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[147] Monty Newborn,et al. Crafty Goes Deep , 1997, J. Int. Comput. Games Assoc..

[148] Ashwin Ram,et al. Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces , 1997, Adapt. Behav..

[149] Jonathan Schaeffer,et al. Search Versus Knowledge in Game-Playing Programs Revisited , 1997, IJCAI.

[150] Avi Pfeffer,et al. Representations and Solutions for Game-Theoretic Problems , 1997, Artif. Intell..

[151] Michael Buro. Experiments with Multi-ProbCut and a New High-Quality Evaluation Function for Othello , 1997 .

[152] Mark Brockington. KEYANO Unplugged -- The Construction of an Othello Program , 1997 .

[153] Jonathan Schaeffer,et al. Kasparov versus Deep Blue: The Rematch , 1997, J. Int. Comput. Games Assoc..

[154] Luca Maria Gambardella,et al. Ant colony system: a cooperative learning approach to the traveling salesman problem , 1997, IEEE Trans. Evol. Comput..

[155] Michael Buro,et al. The Othello Match of the Year: Takeshi Murakami vs. Logistello , 1997, J. Int. Comput. Games Assoc..

[156] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[157] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[158] Dana S. Nau,et al. Computer Bridge - A Big Win for AI Planning , 1998, AI Mag..

[159] Craig Boutilier,et al. The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[160] Shigenobu Kobayashi,et al. An Analysis of Actor/Critic Algorithms Using Eligibility Traces: Reinforcement Learning with Imperfect Value Function , 1998, ICML.

[161] Jonathan Schaeffer,et al. Opponent Modeling in Poker , 1998, AAAI/IAAI.

[162] J. Searle. Mind, Language, And Society: Philosophy In The Real World , 1998 .

[163] Lutz Prechelt,et al. Automatic early stopping using cross validation: quantifying the criteria , 1998, Neural Networks.

[164] Jonathan Baxter. KnightCap : A chess program that learns by combining TD ( ) with game-tree search , 1998 .

[165] Alexander J. Smola,et al. Learning with kernels , 1998 .

[166] Michael I. Jordan. Graphical Models , 2003 .

[167] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[168] David B. Fogel,et al. Evolving neural networks to play checkers without relying on expert knowledge , 1999, IEEE Trans. Neural Networks.

[169] Kate Smith-Miles,et al. Neural Networks for Combinatorial Optimization: A Review of More Than a Decade of Research , 1999, INFORMS J. Comput..

[170] Doina Precup,et al. Using Options for Knowledge Transfer in Reinforcement Learning , 1999 .

[171] X. Yao. Evolving Artificial Neural Networks , 1999 .

[172] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.

[173] Donald F. Beal,et al. Learning Piece-square Values using Temporal Differences , 1999, J. Int. Comput. Games Assoc..

[174] Ernst A. Heinz. Adaptive Null-Move Pruning , 1999, J. Int. Comput. Games Assoc..

[175] Ken Chen,et al. Static Analysis of Life and Death in the Game of Go , 1999, Inf. Sci..

[176] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[177] Vivek S. Borkar,et al. Actor-Critic - Type Learning Algorithms for Markov Decision Processes , 1999, SIAM J. Control. Optim..

[178] Frank Dignum,et al. Deliberative Normative Agents: Principles and Architecture , 1999, ATAL.

[179] Rich Caruana,et al. Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping , 2000, NIPS.

[180] Tsan-sheng Hsu,et al. Construction of Chinese Chess Endgame Databases by Retrograde Analysis , 2000, Computers and Games.

[181] Jürgen Schmidhuber,et al. Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[182] J. Tenenbaum,et al. A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[183] Jonathan Schaeffer,et al. Unifying single-agent and two-player search , 2000, Inf. Sci..

[184] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[185] Kenji Doya,et al. Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[186] Guido Rossum,et al. Python Reference Manual , 2000 .

[187] S T Roweis,et al. Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[188] Donald F. Beal,et al. Temporal Difference Learning for Heuristic Search and Game Playing , 2000, Inf. Sci..

[189] Richard E. Korf,et al. Recent Progress in the Design and Analysis of Admissible Heuristic Functions , 2000, AAAI/IAAI.

[190] Ernst A. Heinz,et al. New Self-Play Results in Computer Chess , 2000, Computers and Games.

[191] Erik van der Werf,et al. AI techniques for the game of Go , 2001 .

[192] Teun Koetsier,et al. On the prehistory of programmable machines: musical automata, looms, calculators , 2001 .

[193] E. Vesterinen,et al. Affective Computing , 2009, Encyclopedia of Biometrics.

[194] Yoshua Bengio,et al. Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[195] T. Patterson. The roots of āyurveda: selections from Sanskrit medical writings , 2001, Medical History.

[196] A. Giotis,et al. LOW-COST STOCHASTIC OPTIMIZATION FOR ENGINEERING APPLICATIONS , 2002 .

[197] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[198] Jonathan Schaeffer,et al. The challenge of poker , 2002, Artif. Intell..

[199] Jose Miguel Puerta,et al. Ant colony optimization for learning Bayesian networks , 2002, Int. J. Approx. Reason..

[200] Barbara Webb,et al. Swarm Intelligence: From Natural to Artificial Systems , 2002, Connect. Sci..

[201] Michael Buro,et al. The evolution of strong othello programs , 2002, IWEC.

[202] Eric O. Postma,et al. Local Move Prediction in Go , 2002, Computers and Games.

[203] Gerald Tesauro,et al. Programming backgammon using self-teaching neural nets , 2002, Artif. Intell..

[204] Jürgen Schmidhuber,et al. Learning Nonregular Languages: A Comparison of Simple Recurrent Networks and LSTM , 2002, Neural Computation.

[205] George Tzanetakis,et al. Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[206] Tracy Brown,et al. The Embodied Mind: Cognitive Science and Human Experience , 2002, Cybern. Hum. Knowing.

[207] Feng-Hsiung Hsu,et al. Behind Deep Blue: Building the Computer that Defeated the World Chess Champion , 2002 .

[208] Noriyuki Kobayashi,et al. Cooperation and competition of agents in the auction of computer bridge , 2003 .

[209] Henri E. Bal,et al. Solving awari with parallel retrograde analysis , 2003, Computer.

[210] Kenji Doya,et al. Meta-learning in Reinforcement Learning , 2003, Neural Networks.

[211] Masakazu Matsugu,et al. Subject independent facial expression recognition with robust face detection using a convolutional neural network , 2003, Neural Networks.

[212] Bruno Bouzy,et al. Monte-Carlo Go Developments , 2003, ACG.

[213] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[214] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[215] Monty Newborn,et al. Deep Blue - an artificial intelligence milestone , 2012 .

[216] Jonathan Schaeffer,et al. Approximating Game-Theoretic Optimal Strategies for Full-scale Poker , 2003, IJCAI.

[217] Eric R. Ziegel,et al. The Elements of Statistical Learning , 2003, Technometrics.

[218] Peter Van Roy,et al. Concepts, Techniques, and Models of Computer Programming , 2004 .

[219] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[220] David Maxwell Chickering,et al. Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[221] Cli McMahon,et al. Machines Who Think : A Personal Inquiry into the History and Prospects of Artificial Intelligence , 2004 .

[222] R. Duke,et al. Policy games for strategic management , 2004 .

[223] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[224] Geoffrey E. Hinton,et al. Reinforcement Learning with Factored States and Actions , 2004, J. Mach. Learn. Res..

[225] Thomas Stützle,et al. Stochastic Local Search: Foundations & Applications , 2004 .

[226] Keechul Jung,et al. GPU implementation of neural networks , 2004, Pattern Recognit..

[227] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[228] J. Ross Quinlan,et al. Induction of Decision Trees , 1986, Machine Learning.

[229] Frank van Harmelen,et al. A semantic web primer , 2004 .

[230] Jonathan Schaeffer,et al. Rediscovering *-Minimax Search , 2004, Computers and Games.

[231] Jonathan Schaeffer,et al. Game-Tree Search with Adaptation in Stochastic Imperfect-Information Games , 2004, Computers and Games.

[232] Timothy Huang,et al. Experiments with learning opening strategy in the game of go , 2004, Int. J. Artif. Intell. Tools.

[233] Massimiliano Pontil,et al. Regularized multi--task learning , 2004, KDD.

[234] A. Ng. Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[235] C. Koch. The quest for consciousness : a neurobiological approach , 2004 .

[236] Andrew Tridgell,et al. Learning to Play Chess Using Temporal Differences , 2000, Machine Learning.

[237] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.

[238] Greg Lindstrom,et al. Programming with Python , 2005, IT Professional.

[239] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[240] Sean Luke,et al. Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.

[241] Isabelle Bichindaritz,et al. Medical applications in case-based reasoning , 2005, The Knowledge Engineering Review.

[242] Jürgen Schmidhuber,et al. Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition , 2005, ICANN.

[243] Johan Håstad,et al. On the power of small-depth threshold circuits , 1991, computational complexity.

[244] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[245] Bart Demoen,et al. Programming in Prolog. Using the ISO Standard. by William F. Clocksin, Christopher S. Mellish, Springer-Verlag, 2003, ISBN 3-540-00678-8, xiii+299 pages , 2005, Theory and Practice of Logic Programming.

[246] Ivan Bratko,et al. Bias and pathology in minimax search , 2005, Theor. Comput. Sci..

[247] Ricardo Vilalta,et al. A Perspective View and Survey of Meta-Learning , 2002, Artificial Intelligence Review.

[248] Dana S. Nau,et al. Experiments on alternatives to minimax , 2005, International Journal of Parallel Programming.

[249] Michael C. Fu,et al. An Adaptive Sampling Algorithm for Solving Markov Decision Processes , 2005, Oper. Res..

[250] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[251] David B. Fogel,et al. Further Evolution of a Self-Learning Chess Program , 2005, CIG.

[252] Tristan Cazenave,et al. Combining Tactical Search and Monte-Carlo in the Game of Go , 2005, CIG.

[253] Michael R. Genesereth,et al. General Game Playing: Overview of the AAAI Competition , 2005, AI Mag..

[254] Philipp Slusallek,et al. Introduction to real-time ray tracing , 2005, SIGGRAPH Courses.

[255] Bram Bakker,et al. Hierarchical Reinforcement Learning Based on Subgoal Discovery and Subpolicy Specialization , 2003 .

[256] Daphne Koller,et al. Ordering-Based Search: A Simple and Effective Algorithm for Learning Bayesian Networks , 2005, UAI.

[257] David B. Fogel,et al. The Blondie25 Chess Program Competes Against Fritz 8.0 and a Human Chess Master , 2006, 2006 IEEE Symposium on Computational Intelligence and Games.

[258] Pieter Abbeel,et al. An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[259] Tuomas Sandholm,et al. A Competitive Texas Hold'em Poker Player via Automated Abstraction and Real-Time Equilibrium Computation , 2006, AAAI.

[260] Catholijn M. Jonker,et al. An agent architecture for multi-attribute negotiation using incomplete preference information , 2007, Autonomous Agents and Multi-Agent Systems.

[261] Rich Caruana,et al. Model compression , 2006, KDD '06.

[262] Jürgen Schmidhuber,et al. Optimal Artiﬁcial Curiosity, Creativity, Music, and the Fine Arts , 2005 .

[263] Yoshua Bengio,et al. Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[264] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[265] Gregory Chaitin,et al. The limits of reason. , 2006, Scientific American.

[266] John Tromp,et al. Combinatorics of Go , 2006, Computers and Games.

[267] Thore Graepel,et al. Bayesian pattern ranking for move prediction in the game of Go , 2006, ICML.

[268] Olivier Teytaud,et al. Modification of UCT with Patterns in Monte-Carlo Go , 2006 .

[269] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[270] Shane Legg,et al. A Collection of Definitions of Intelligence , 2007, AGI.

[271] Sridhar Mahadevan,et al. Hierarchical multi-agent reinforcement learning , 2001, AGENTS '01.

[272] James H. Moor,et al. The Dartmouth College Artificial Intelligence Conference: The Next Fifty Years , 2006, AI Mag..

[273] Rémi Coulom,et al. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[274] Massimiliano Pontil,et al. Multi-Task Feature Learning , 2006, NIPS.

[275] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.

[276] Rémi Coulom. Monte-Carlo Tree Search in Crazy Stone , 2007 .

[277] David Silver,et al. Combining online and offline knowledge in UCT , 2007, ICML '07.

[278] Stephan Schiffel,et al. Fluxplayer: A Successful General Game Player , 2007, AAAI.

[279] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.

[280] Shi-Chun Tsai,et al. On the fairness and complexity of generalized k-in-a-row games , 2007, Theor. Comput. Sci..

[281] Tom M. Mitchell,et al. The Need for Biases in Learning Generalizations , 2007 .

[282] International Foundation for Autonomous Agents and MultiAgent Systems ( IFAAMAS ) , 2007 .

[283] Eric A. Hansen,et al. Anytime Heuristic Search , 2011, J. Artif. Intell. Res..

[284] Rajat Raina,et al. Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[285] Joost Broekens,et al. Emotion and Reinforcement: Affective Facial Expressions Facilitate Robot Learning , 2007, Artifical Intelligence for Human Computing.

[286] Richard E. Neapolitan,et al. Learning Bayesian networks , 2007, KDD '07.

[287] Michael H. Bowling,et al. Regret Minimization in Games with Incomplete Information , 2007, NIPS.

[288] Richard S. Sutton,et al. Reinforcement Learning of Local Shape in the Game of Go , 2007, IJCAI.

[289] Mark Harman,et al. The Current State and Future of Search Based Software Engineering , 2007, Future of Software Engineering (FOSE '07).

[290] Mauro Birattari,et al. Swarm Intelligence , 2012, Lecture Notes in Computer Science.

[291] Yoshua. Bengio,et al. Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[292] Jürgen Schmidhuber,et al. An Application of Recurrent Neural Networks to Discriminative Keyword Spotting , 2007, ICANN.

[293] Johannes Fürnkranz,et al. Learning of Piece Values for Chess Variants , 2008 .

[294] Pieter Spronck,et al. Monte-Carlo Tree Search: A New Framework for Game AI , 2008, AIIDE.

[295] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .

[296] Nathan S. Netanyahu,et al. Genetic algorithms for mentor-assisted evaluation function optimization , 2008, GECCO '08.

[297] H. Jaap van den Herik,et al. Parallel Monte-Carlo Tree Search , 2008, Computers and Games.

[298] Leslie G. Valiant,et al. Knowledge Infusion: In Pursuit of Robustness in Artificial Intelligence , 2008, FSTTCS.

[299] Bart De Schutter,et al. A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[300] Tibor Bosse,et al. Formalisation of Damasio’s theory of emotion, feeling and core consciousness , 2008, Consciousness and Cognition.

[301] Sunita Sarawagi. Learning with Graphical Models , 2008 .

[302] Ilya Sutskever,et al. Mimicking Go Experts with Convolutional Neural Networks , 2008, ICANN.

[303] H. Jaap van den Herik,et al. Single-Player Monte-Carlo Tree Search , 2008, Computers and Games.

[304] David Silver,et al. Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008) Achieving Master Level Play in 9 × 9 Computer Go , 2022 .

[305] Jonathan Schaeffer,et al. One Jump Ahead: Computer Perfection at Checkers , 2008 .

[306] H. Jaap van den Herik,et al. Progressive Strategies for Monte-Carlo Tree Search , 2008 .

[307] J. Huizinga. Homo ludens : proeve eener bepaling van het spel-element der cultuur , 2008 .

[308] Tom Schaul,et al. Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[309] Yngvi Björnsson,et al. CadiaPlayer: A Simulation-Based General Game Player , 2009, IEEE Transactions on Computational Intelligence and AI in Games.

[310] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[311] Joel Veness,et al. Bootstrapping from Game Tree Search , 2009, NIPS.

[312] Honglak Lee,et al. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[313] Nils J. Nilsson,et al. The Quest for Artificial Intelligence , 2009 .

[314] Peter Stone,et al. Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[315] J. Broekens,et al. Assistive social robots in elderly care: a review , 2009 .

[316] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[317] Lihong Li,et al. A Bayesian Sampling Approach to Exploration in Reinforcement Learning , 2009, UAI.

[318] Marco Scutari,et al. Learning Bayesian Networks with the bnlearn R Package , 2009, 0908.3817.

[319] Ian H. Witten,et al. The WEKA data mining software: an update , 2009, SKDD.

[320] Luís Seabra Lopes,et al. DarkBlade: A Program That Plays Diplomacy , 2009, EPIA.

[321] Burr Settles,et al. Active Learning Literature Survey , 2009 .

[322] David P. Helmbold,et al. All-Moves-As-First Heuristics in Monte-Carlo Go , 2009, IC-AI.

[323] Frank L. Lewis,et al. Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2009, 2009 International Joint Conference on Neural Networks.

[324] Jason Weston,et al. Curriculum learning , 2009, ICML '09.

[325] Martin Mueller. Fuego at the Computer Olympiad in Pamplona 2009: A Tournament Report , 2009 .

[326] Pieter Spronck,et al. Monte-Carlo Tree Search in Settlers of Catan , 2009, ACG.

[327] Mark H. M. Winands,et al. Quiescence Search for Stratego , 2009 .

[328] Ricardo Vilalta,et al. Metalearning - Applications to Data Mining , 2008, Cognitive Technologies.

[329] Dana S. Nau,et al. Error Minimizing Minimax : Avoiding Search Pathology in Game Trees , 2009 .

[330] Kai A. Krueger,et al. Flexible shaping: How learning in small steps helps , 2009, Cognition.

[331] Daniel Michulke,et al. Neural Networks for State Evaluation in General Game Playing , 2009, ECML/PKDD.

[332] Shalabh Bhatnagar,et al. Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.

[333] Martin Müller,et al. A Lock-Free Multithreaded Monte-Carlo Tree Search Algorithm , 2009, ACG.

[334] Michael Thielscher. Answer Set Programming for Single-Player Games in General Game Playing , 2009, ICLP.

[335] David Silver,et al. Reinforcement Learning and Simulation Based Search in the Game of Go , 2009 .

[336] Yoshua Bengio,et al. Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[337] Julian Togelius,et al. Multiobjective exploration of the StarCraft map space , 2010, Proceedings of the 2010 IEEE Conference on Computational Intelligence and Games.

[338] Michael Thielscher,et al. A General Game Description Language for Incomplete Information Games , 2010, AAAI.

[339] Yngvi Björnsson,et al. Learning Simulation Control in General Game-Playing Agents , 2010, AAAI.

[340] Sarit Kraus,et al. Can automated agents proficiently negotiate with humans? , 2010, CACM.

[341] Jürgen Schmidhuber,et al. Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[342] Marco Wiering. Self-Play and Using an Expert to Learn to Play Backgammon with Temporal Difference Learning , 2010, J. Intell. Learn. Syst. Appl..

[343] J. O’Neill,et al. Play it again: reactivation of waking experience and memory , 2010, Trends in Neurosciences.

[344] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[345] Leslie Pack Kaelbling,et al. Hierarchical Planning in the Now , 2010, Bridging the Gap Between Task and Motion Planning.

[346] Nicola Beume,et al. Towards Intelligent Team Composition and Maneuvering in Real-Time Strategy Games , 2010, IEEE Transactions on Computational Intelligence and AI in Games.

[347] Julian Togelius,et al. Search-Based Procedural Content Generation , 2010, EvoApplications.

[348] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.

[349] Hendrik Baier,et al. The Power of Forgetting: Improving the Last-Good-Reply Policy in Monte Carlo Go , 2010, IEEE Transactions on Computational Intelligence and AI in Games.

[350] Jie Cheng,et al. CUDA by Example: An Introduction to General-Purpose GPU Programming , 2010, Scalable Comput. Pract. Exp..

[351] Martin Müller,et al. Fuego—An Open-Source Framework for Board Games and Go Engine Based on Monte Carlo Tree Search , 2010, IEEE Transactions on Computational Intelligence and AI in Games.

[352] Tuomas Sandholm,et al. The State of Solving Large Incomplete-Information Games, and Application to Poker , 2010, AI Mag..

[353] Li Fei-Fei,et al. ImageNet: Constructing a large-scale image database , 2010 .

[354] Julien Kloetzer. Monte-Carlo Opening Books for Amazons , 2010, Computers and Games.

[355] Ryan B. Hayward,et al. Monte Carlo Tree Search in Hex , 2010, IEEE Transactions on Computational Intelligence and AI in Games.

[356] Peter Auer,et al. UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem , 2010, Period. Math. Hung..

[357] Thomas Bartz-Beielstein,et al. Experimental Methods for the Analysis of Optimization Algorithms , 2010 .

[358] Bart Selman,et al. On Adversarial Search Spaces and Sampling-Based Planning , 2010, ICAPS.

[359] Qiang Yang,et al. A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[360] Luca Maria Gambardella,et al. Deep, Big, Simple Neural Nets for Handwritten Digit Recognition , 2010, Neural Computation.

[361] Olivier Teytaud,et al. Special Issue on Monte Carlo Techniques and Computer Go , 2010, IEEE Trans. Comput. Intell. AI Games.

[362] Fons J. Verbeek,et al. Pattern Recognition for High Throughput Zebrafish Imaging Using Genetic Algorithm Optimization , 2010, PRIB.

[363] Nando de Freitas,et al. A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[364] Richard B. Segal,et al. On the Scalability of Parallel UCT , 2010, Computers and Games.

[365] Joel Veness,et al. Monte-Carlo Planning in Large POMDPs , 2010, NIPS.

[366] Shang-Rong Tsai,et al. Current Frontiers in Computer Go , 2010, IEEE Transactions on Computational Intelligence and AI in Games.

[367] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[368] Petr Baudis,et al. Balancing MCTS by Dynamically Adjusting the Komi Value , 2011, J. Int. Comput. Games Assoc..

[369] Petr Baudis,et al. PACHI: State of the Art Open Source Go Program , 2011, ACG.

[370] Damien Pellier,et al. MCTS Experiments on the Voronoi Game , 2011, ACG.

[371] Ian D. Watson,et al. Computer poker: A review , 2011, Artif. Intell..

[372] Alan Fern,et al. Ensemble Monte-Carlo Planning: An Empirical Study , 2011, ICAPS.

[373] Mohamed Chtourou,et al. On the training of recurrent neural networks , 2011, Eighth International Multi-Conference on Systems, Signals & Devices.

[374] Arno J. Knobbe,et al. Non-redundant Subgroup Discovery in Large and Complex Data , 2011, ECML/PKDD.

[375] Richard J. Lorentz. Experiments with Monte-Carlo Tree Search in the Game of Havannah , 2011, J. Int. Comput. Games Assoc..

[376] Stefan Schaal,et al. Hierarchical reinforcement learning with movement primitives , 2011, 2011 11th IEEE-RAS International Conference on Humanoid Robots.

[377] Digital Computers Applied to Games. Faster Than Thought , 2011 .

[378] Kevin Leyton-Brown,et al. Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[379] Christopher D. Rosin,et al. Multi-armed bandits with episode context , 2011, Annals of Mathematics and Artificial Intelligence.

[380] Perry R. Cook,et al. Real-time human interaction with supervised learning algorithms for music composition and performance , 2011 .

[381] Tuomas Sandholm,et al. Game theory-based opponent modeling in large imperfect-information games , 2011, AAMAS.

[382] H. V. van Vlijmen,et al. Which Compound to Select in Lead Optimization? Prospectively Validated Proteochemometric Models Guide Preclinical Development , 2011, PloS one.

[383] Hrafn Eiríksson,et al. Investigation of Multi-Cut Pruning in Game-Tree Search , 2011 .

[384] Kamil Rocki,et al. Large-Scale Parallel Monte Carlo Tree Search on GPU , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[385] Sabine Kastner,et al. Human consciousness and its relationship to social neuroscience: A novel hypothesis , 2011, Cognitive neuroscience.

[386] David Silver,et al. Monte-Carlo tree search and rapid action value estimation in computer Go , 2011, Artif. Intell..

[387] Geoffrey E. Hinton,et al. Generating Text with Recurrent Neural Networks , 2011, ICML.

[388] Richard J. Lorentz. An MCTS Program to Play EinStein Würfelt Nicht! , 2011, ACG.

[389] Maarten Sierhuis,et al. Beyond Cooperative Robotics: The Central Role of Interdependence in Coactive Design , 2011, IEEE Intelligent Systems.

[390] G. Kalyanaram,et al. Nudge: Improving Decisions about Health, Wealth, and Happiness , 2011 .

[391] Michael Thielscher. The General Game Playing Description Language Is Universal , 2011, IJCAI.

[392] Huajun Chen,et al. The Semantic Web , 2011, Lecture Notes in Computer Science.

[393] Lutz Prechelt,et al. Early Stopping - But When? , 2012, Neural Networks: Tricks of the Trade.

[394] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[395] Yoshua Bengio,et al. Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[396] Michael Thielscher,et al. HyperPlay: A Solution to General Game Playing with Imperfect Information , 2012, AAAI.

[397] Jasper Snoek,et al. Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[398] Jürgen Schmidhuber,et al. Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[399] Razvan Pascanu,et al. Theano: Deep Learning on GPUs with Python , 2012 .

[400] Robert Babuska,et al. A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[401] Yoshua Bengio,et al. Practical Recommendations for Gradient-Based Training of Deep Architectures , 2012, Neural Networks: Tricks of the Trade.

[402] Ronald Parr,et al. Greedy Algorithms for Sparse Reinforcement Learning , 2012, ICML.

[403] Julian Togelius,et al. The Mario AI Benchmark and Competitions , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[404] Vysoké Učení,et al. Statistical Language Models Based on Neural Networks , 2012 .

[405] Richard S. Sutton,et al. Temporal-difference search in computer Go , 2012, Machine Learning.

[406] Michèle Sebag,et al. The grand challenge of computer Go , 2012, Commun. ACM.

[407] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[408] Yee Whye Teh,et al. Actor-Critic Reinforcement Learning with Energy-Based Policies , 2012, EWRL.

[409] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[410] Dimitri P. Bertsekas,et al. Rollout Algorithms for Discrete Optimization: A Survey , 2012 .

[411] Simon Colton,et al. The Painting Fool: Stories from Building an Automated Painter , 2012 .

[412] Alex Graves,et al. Supervised Sequence Labelling , 2012 .

[413] Song Yu,et al. SPARSE MATRIX-VECTOR MULTIPLICATION ON NVIDIA GPU , 2012 .

[414] Michael Johanson,et al. Measuring the Size of Large No-Limit Poker Games , 2013, ArXiv.

[415] Christopher Archibald,et al. Monte Carlo *-Minimax Search , 2013, IJCAI.

[416] Marco Wiering,et al. Reinforcement learning in the game of Othello: Learning against a fixed opponent and learning from self-play , 2013, 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[417] Pascal Vincent,et al. Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[418] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[419] Sergey Levine,et al. Guided Policy Search , 2013, ICML.

[420] Carlos Cotta,et al. A review of computational intelligence in RTS games , 2013, 2013 IEEE Symposium on Foundations of Computational Intelligence (FOCI).

[421] Michel Gendreau,et al. Hyper-heuristics: a survey of the state of the art , 2013, J. Oper. Res. Soc..

[422] Martin Zinkevich,et al. The Annual Computer Poker Competition , 2013, AI Mag..

[423] Sarit Kraus,et al. Evaluating practical negotiating agents: Results and analysis of the 2011 international competition , 2013, Artif. Intell..

[424] Marc'Aurelio Ranzato,et al. DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[425] Alex Alves Freitas,et al. Contrasting meta-learning and hyper-heuristic research: the role of evolutionary algorithms , 2013, Genetic Programming and Evolvable Machines.

[426] Édouard Bonnet,et al. On the Complexity of Trick-Taking Card Games , 2013, IJCAI.

[427] Qiang Yang,et al. Lifelong Machine Learning Systems: Beyond Learning Algorithms , 2013, AAAI Spring Symposium: Lifelong Machine Learning.

[428] Santiago Ontañón,et al. A Survey of Real-Time Strategy Game AI Research and Competition in StarCraft , 2013, IEEE Transactions on Computational Intelligence and AI in Games.

[429] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[430] Marc G. Bellemare,et al. Bayesian Learning of Recursively Factored Environments , 2013, ICML.

[431] H. Jaap van den Herik,et al. Improving multivariate Horner schemes with Monte Carlo tree search , 2012, Comput. Phys. Commun..

[432] Kevin Leyton-Brown,et al. Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms , 2012, KDD.

[433] I. Good. A FIVE-YEAR PLAN FOR AUTOMATIC CHESS , 2013 .

[434] Shih-Chieh Huang,et al. MoHex 2.0: A Pattern-Based MCTS Hex Player , 2013, Computers and Games.

[435] H. Jaap van den Herik,et al. Investigations with Monte Carlo Tree Search for Finding Better Multivariate Horner Schemes , 2013, ICAART.

[436] Navdeep Jaitly,et al. Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.

[437] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[438] Zhiwei Qin,et al. Sparse Reinforcement Learning via Convex Optimization , 2014, ICML.

[439] H. Jaap van den Herik,et al. HEPGAME and the Simplification of Expressions , 2014, ArXiv.

[440] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[441] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[442] P. Baldi,et al. Searching for exotic particles in high-energy physics with deep learning , 2014, Nature Communications.

[443] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[444] Jonathan Schaeffer,et al. A New Paradigm for Minimax Search , 2014, ArXiv.

[445] H. Jaap van den Herik,et al. Genetic Algorithms for Evolving Computer Chess Programs , 2014, IEEE Transactions on Evolutionary Computation.

[446] Jack J. Dongarra,et al. Scaling up matrix computations on shared-memory manycore systems with 1000 CPU cores , 2014, ICS '14.

[447] Alex Graves,et al. Recurrent Models of Visual Attention , 2014, NIPS.

[448] Vedran Dunjko,et al. Quantum speedup for active learning agents , 2014, 1401.4997.

[449] อนิรุธ สืบสิงห์,et al. Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[450] Honglak Lee,et al. Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning , 2014, NIPS.

[451] Sarit Kraus,et al. GENIUS: AN INTEGRATED ENVIRONMENT FOR SUPPORTING THE DESIGN OF GENERIC AUTOMATED NEGOTIATORS , 2012, Comput. Intell..

[452] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[453] Peter A. Flach,et al. Subgroup Discovery in Smart Electricity Meter Data , 2014, IEEE Transactions on Industrial Informatics.

[454] Steven te Brinke,et al. Monte Carlo Tree Search , 2014 .

[455] Diego Klabjan,et al. Skill-based differences in spatio-temporal team behaviour in defence of the Ancients 2 (DotA 2) , 2014, 2014 IEEE Games Media Entertainment.

[456] D. Hambrick,et al. On Behalf Of: Association for Psychological Science Onlinefirst Version of Record >> Deliberate Practice and Performance in Music, Games, Sports, Education, and Professions: a Meta-analysis the Current Meta-analysis Effect Sizes Meta-analytic Procedure , 2022 .

[457] H. Jaap van den Herik,et al. Combining Simulated Annealing and Monte Carlo Tree Search for Expression Simplification , 2013, ICAART.

[458] Rasoul Karimi,et al. Active Learning for Recommender Systems , 2015, KI - Künstliche Intelligenz.

[459] Hesham El-Deeb,et al. A Comparative Study of Game Tree Searching Methods , 2014 .

[460] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[461] Neil Burch,et al. Heads-up limit hold’em poker is solved , 2015, Science.

[462] David Silver,et al. Move Evaluation in Go Using Deep Convolutional Neural Networks , 2014, ICLR.

[463] Jürgen Schmidhuber,et al. Deep learning in neural networks: An overview , 2014, Neural Networks.

[464] H. Jaap van den Herik,et al. Scaling Monte Carlo Tree Search on Intel Xeon Phi , 2015, 2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS).

[465] Ralf Funke. From Gobble to Zen: The Quest for Truly Intelligent Software and the Monte Carlo Revolution in Go , 2015 .

[466] D. Jonge. Negotiations over large agreement spaces , 2015 .

[467] Richard Evans,et al. Deep Reinforcement Learning in Large Discrete Action Spaces , 2015, 1512.07679.

[468] Philip H. S. Torr,et al. An embarrassingly simple approach to zero-shot learning , 2015, ICML.

[469] Matthew Lai,et al. Giraffe: Using Deep Reinforcement Learning to Play Chess , 2015, ArXiv.

[470] Hao Wang,et al. Optimally Weighted Cluster Kriging for Big Data Regression , 2015, IDA.

[471] Tom Schaul,et al. Universal Value Function Approximators , 2015, ICML.

[472] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[473] Xinlei Chen,et al. Microsoft COCO Captions: Data Collection and Evaluation Server , 2015, ArXiv.

[474] Aaron Klein,et al. Efficient and Robust Automated Machine Learning , 2015, NIPS.

[475] Joshua B. Tenenbaum,et al. Human-level concept learning through probabilistic program induction , 2015, Science.

[476] Navdeep Jaitly,et al. Pointer Networks , 2015, NIPS.

[477] Garrison W. Cottrell,et al. Basic Level Categorization Facilitates Visual Object Recognition , 2015, ArXiv.

[478] Marco Platzner,et al. Adaptive Playouts in Monte-Carlo Tree Search with Policy-Gradient Reinforcement Learning , 2015, ACG.

[479] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[480] H. Jaap van den Herik,et al. Past Our Prime: A Study of Age and Play Style Development in Battlefield 3 , 2015, IEEE Transactions on Computational Intelligence and AI in Games.

[481] Amos J. Storkey,et al. Training Deep Convolutional Neural Networks to Play Go , 2015, ICML.

[482] Martin A. Riedmiller,et al. Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[483] Santiago Ontañón,et al. A Benchmark for StarCraft Intelligent Agents , 2015 .

[484] Luc De Raedt,et al. Neural-Symbolic Learning and Reasoning: Contributions and Challenges , 2015, AAAI Spring Symposia.

[485] Bojun Huang,et al. Pruning Game Tree by Rollouts , 2015, AAAI.

[486] Simon M. Lucas,et al. Open Loop Search for General Video Game Playing , 2015, GECCO.

[487] Michael Thielscher,et al. Lifting Model Sampling for General Game Playing to Incomplete-Information Models , 2015, AAAI.

[488] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[489] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[490] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[491] H. Jaap van den Herik,et al. Transfer Learning of Air Combat Behavior , 2015, 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA).

[492] Trevor Darrell,et al. Sequence to Sequence -- Video to Text , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[493] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[494] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[495] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[496] John D. Kelleher,et al. Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies , 2015 .

[497] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[498] Yuandong Tian,et al. Better Computer Go Player with Neural Network and Long-term Prediction , 2016, ICLR.

[499] Daan Wierstra,et al. One-Shot Generalization in Deep Generative Models , 2016, ICML.

[500] Nando de Freitas,et al. Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[501] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[502] Chiara F. Sironi,et al. Comparison of rapid action value estimation variants for general game playing , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[503] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[504] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[505] Christin Wirth,et al. Blondie24 Playing At The Edge Of Ai , 2016 .

[506] H. Jaap van den Herik,et al. Ensemble UCT Needs High Exploitation , 2015, ICAART.

[507] Daan Wierstra,et al. Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.

[508] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[509] Nando de Freitas,et al. Bayesian Optimization in a Billion Dimensions via Random Embeddings , 2013, J. Artif. Intell. Res..

[510] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[511] Leslie Pérez Cáceres,et al. The irace package: Iterated racing for automatic algorithm configuration , 2016 .

[512] Alex Graves,et al. Strategic Attentive Writer for Learning Macro-Actions , 2016, NIPS.

[513] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[514] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[515] Michael S. Lew,et al. Deep learning for visual understanding: A review , 2016, Neurocomputing.

[516] Thomas G. Dietterich,et al. Incorporating Expert Feedback into Active Anomaly Discovery , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[517] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[518] Nathan S. Netanyahu,et al. DeepChess: End-to-End Deep Neural Network for Automatic Learning in Chess , 2016, ICANN.

[519] Pieter Abbeel,et al. Value Iteration Networks , 2016, NIPS.

[520] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[521] Andreas Müller,et al. Introduction to Machine Learning with Python: A Guide for Data Scientists , 2016 .

[522] Peter L. Bartlett,et al. RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[523] John Tromp,et al. A Googolplex of Go Games , 2016, Computers and Games.

[524] Aske Plaat,et al. On the Impact of Data Set Size in Transfer Learning Using Deep Neural Networks , 2016, IDA.

[525] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.

[526] Joshua B. Tenenbaum,et al. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[527] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.

[528] Julian Togelius,et al. Ieee Transactions on Computational Intelligence and Ai in Games the 2014 General Video Game Playing Competition , 2022 .

[529] Koen V. Hindriks,et al. Automated Negotiating Agents Competition (ANAC) , 2017, AAAI.

[530] Chiara F. Sironi,et al. On-Line Parameter Tuning for Monte-Carlo Tree Search in General Game Playing , 2017, CGW@IJCAI.

[531] Aurélien Géron,et al. Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems , 2017 .

[532] Hugo Larochelle,et al. Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[533] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[534] Nikhil Ketkar,et al. Deep Learning with Python , 2017 .

[535] Alex Graves,et al. Automated Curriculum Learning for Neural Networks , 2017, ICML.

[536] Demis Hassabis,et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[537] Tom Schaul,et al. The Predictron: End-To-End Learning and Planning , 2016, ICML.

[538] Shimon Whiteson,et al. Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning , 2017, ICML.

[539] Vivienne Sze,et al. Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.

[540] Xi Chen,et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[541] C A Nelson,et al. Learning to Learn , 2017, Encyclopedia of Machine Learning and Data Mining.

[542] Yuandong Tian,et al. ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games , 2017, NIPS.

[543] Koray Kavukcuoglu,et al. Combining policy gradient and Q-learning , 2016, ICLR.

[544] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[545] Razvan Pascanu,et al. Imagination-Augmented Agents for Deep Reinforcement Learning , 2017, NIPS.

[546] Ramesh Raskar,et al. Designing Neural Network Architectures using Reinforcement Learning , 2016, ICLR.

[547] Catholijn M. Jonker,et al. Efficient exploration with Double Uncertain Value Networks , 2017, ArXiv.

[548] M. Kubát. An Introduction to Machine Learning , 2017, Springer International Publishing.

[549] David Barber,et al. Thinking Fast and Slow with Deep Learning and Tree Search , 2017, NIPS.

[550] Título D-Brane,et al. D-Brane : a diplomacy playing agent for automated negotiations research , 2017 .

[551] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[552] Malcolm I. Heywood,et al. Multi-task learning in Atari video games with emergent tangled program graphs , 2017, GECCO.

[553] Lina J. Karam,et al. A Study and Comparison of Human and Deep Learning Recognition Performance under Visual Distortions , 2017, 2017 26th International Conference on Computer Communication and Networks (ICCCN).

[554] Bernt Schiele,et al. Zero-Shot Learning — The Good, the Bad and the Ugly , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[555] Tao Zhang,et al. A Survey of Model Compression and Acceleration for Deep Neural Networks , 2017, ArXiv.

[556] S. Baum. A Survey of Artificial General Intelligence Projects for Ethics, Risk, and Policy , 2017 .

[557] Dale Schuurmans,et al. Bridging the Gap Between Value and Policy Based Reinforcement Learning , 2017, NIPS.

[558] Kevin Waugh,et al. DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.

[559] Jürgen Schmidhuber,et al. LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[560] Sergey Ioffe,et al. Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models , 2017, NIPS.

[561] H. Jaap van den Herik,et al. Structured Parallel Programming for Monte Carlo Tree Search , 2017, ArXiv.

[562] Xiaoming Liu,et al. Disentangled Representation Learning GAN for Pose-Invariant Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[563] Geoffrey E. Hinton,et al. Distilling a Neural Network Into a Soft Decision Tree , 2017, CEx@AI*IA.

[564] Razvan Pascanu,et al. Learning model-based planning from scratch , 2017, ArXiv.

[565] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[566] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[567] Lars Kotthoff,et al. Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA , 2017, J. Mach. Learn. Res..

[568] Shimon Whiteson,et al. TreeQN and ATreeC: Differentiable Tree Planning for Deep Reinforcement Learning , 2017, ICLR 2018.

[569] Peter Henderson,et al. Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control , 2017, ArXiv.

[570] H. Jaap van den Herik,et al. An Analysis of Virtual Loss in Parallel MCTS , 2017, ICAART.

[571] Simon M. Lucas,et al. General Video Game AI: Learning from screen capture , 2017, 2017 IEEE Congress on Evolutionary Computation (CEC).

[572] Yunguan Fu,et al. Ranked Reward: Enabling Self-Play Reinforcement Learning for Combinatorial Optimization , 2018, ArXiv.

[573] Chih-Cheng Lai,et al. Comparison of machine learning models for the prediction of mortality of patients with unplanned extubation in intensive care units , 2018, Scientific Reports.

[574] Aske Plaat,et al. Priming Digitisation: Learning the Textual Structure in Field Books , 2018 .

[575] Frank Hutter,et al. Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari , 2018, IJCAI.

[576] Catholijn M. Jonker,et al. The Potential of the Return Distribution for Exploration in RL , 2018, ArXiv.

[577] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.

[578] Sergey Levine,et al. Probabilistic Model-Agnostic Meta-Learning , 2018, NeurIPS.

[579] Wu Chen,et al. A Search Optimization Method for Rule Learning in Board Games , 2018, PRICAI.

[580] Tuomas Sandholm,et al. Depth-Limited Solving for Imperfect-Information Games , 2018, NeurIPS.

[581] Hui Wang,et al. Assessing the Potential of Classical Q-learning in General Game Playing , 2018, BNCAI.

[582] Wenlong Fu,et al. Model-based reinforcement learning: A survey , 2018 .

[583] Rob Fergus,et al. Learning Goal Embeddings via Self-Play for Hierarchical Reinforcement Learning , 2018, ArXiv.

[584] H. Jaap van den Herik,et al. Pipeline Pattern for Parallel MCTS , 2018, ICAART.

[585] Sergey Levine,et al. Meta-Reinforcement Learning of Structured Exploration Strategies , 2018, NeurIPS.

[586] Matteo Hessel,et al. Deep Reinforcement Learning and the Deadly Triad , 2018, ArXiv.

[587] Bas van Stein,et al. Automatic Configuration of Deep Neural Networks with EGO , 2018, ArXiv.

[588] Jaakko Lehtinen,et al. Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[589] M. Hutson. Artificial intelligence faces reproducibility crisis. , 2018, Science.

[590] Michael I. Jordan,et al. RLlib: Abstractions for Distributed Reinforcement Learning , 2017, ICML.

[591] Rémi Munos,et al. Learning to Search with MCTSnets , 2018, ICML.

[592] Malcolm I. Heywood,et al. Emergent Tangled Program Graphs in Multi-Task Learning , 2018, IJCAI.

[593] Henry Charlesworth,et al. Application of Self-Play Reinforcement Learning to a Four-Player Game of Imperfect Information , 2018, ArXiv.

[594] Joel Z. Leibo,et al. Human-level performance in first-person multiplayer games with population-based deep reinforcement learning , 2018, ArXiv.

[595] Wojciech Samek,et al. Methods for interpreting and understanding deep neural networks , 2017, Digit. Signal Process..

[596] Catholijn M. Jonker,et al. Monte Carlo Tree Search for Asymmetric Trees , 2018, ArXiv.

[597] Koen V. Hindriks,et al. StarCraft as a Testbed for Engineering Complex Distributed Systems Using Cognitive Agent Technology , 2018, AAMAS.

[598] Geraint Rees,et al. Clinically applicable deep learning for diagnosis and referral in retinal disease , 2018, Nature Medicine.

[599] Sergey Levine,et al. Unsupervised Meta-Learning for Reinforcement Learning , 2018, ArXiv.

[600] Julian Togelius,et al. Artificial Intelligence and Games , 2018, Springer International Publishing.

[601] Jakub W. Pachocki,et al. Emergent Complexity via Multi-Agent Competition , 2017, ICLR.

[602] Elmar Eisemann,et al. DeepEyes: Progressive Visual Analytics for Designing Deep Neural Networks , 2018, IEEE Transactions on Visualization and Computer Graphics.

[603] Takayuki Ito,et al. The Challenge of Negotiation in the Game of Diplomacy , 2018, AT.

[604] Ben Ruijl,et al. Games and loop integrals , 2018, Journal of Physics: Conference Series.

[605] H. Jaap van den Herik,et al. A Lock-free Algorithm for Parallel MCTS , 2018, ICAART.

[606] Noam Brown,et al. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.

[607] Marc G. Bellemare,et al. The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning , 2017, ICLR.

[608] Mike Preuss,et al. Planning chemical syntheses with deep neural networks and symbolic AI , 2017, Nature.

[609] Kiminori Matsuzaki. Empirical Analysis of PUCT Algorithm with Evaluation Functions of Different Quality , 2018, 2018 Conference on Technologies and Applications of Artificial Intelligence (TAAI).

[610] Andrea Montanari,et al. A mean field view of the landscape of two-layer neural networks , 2018, Proceedings of the National Academy of Sciences.

[611] Youhei Akimoto,et al. Probabilistic Model-Based Dynamic Architecture Search , 2018 .

[612] Mélanie Frappier,et al. The Book of Why: The New Science of Cause and Effect , 2018, Science.

[613] Catholijn M. Jonker,et al. A0C: Alpha Zero in Continuous Action Space , 2018, ArXiv.

[614] D. Weinshall,et al. Curriculum Learning by Transfer Learning: Theory and Experiments with Deep Networks , 2018, ICML.

[615] Demis Hassabis,et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[616] Quoc V. Le,et al. Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.

[617] S. Levine,et al. Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning , 2019, CoRL.

[618] Frank Hutter,et al. Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..

[619] Joelle Pineau,et al. No Press Diplomacy: Modeling Multi-Agent Gameplay , 2019, NeurIPS.

[620] Ruiyang Xu,et al. Learning Self-Game-Play Agents for Combinatorial Optimization Problems , 2019, AAMAS.

[621] Joel Z. Leibo,et al. Autocurricula and the Emergence of Innovation from Social Interaction: A Manifesto for Multi-Agent Intelligence Research , 2019, ArXiv.

[622] Noam Brown,et al. Superhuman AI for multiplayer poker , 2019, Science.

[623] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[624] Ruth Nussinov,et al. Computational Structural Biology: Successes, Future Directions, and Challenges , 2019, Molecules.

[625] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[626] Sergey Levine,et al. Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables , 2019, ICML.

[627] Joelle Pineau,et al. Online Adaptative Curriculum Learning for GANs , 2018, AAAI.

[628] Pieter Abbeel,et al. Benchmarking Model-Based Reinforcement Learning , 2019, ArXiv.

[629] Tom Eccles,et al. An investigation of model-free planning , 2019, ICML.

[630] Rémi Munos,et al. Recurrent Experience Replay in Distributed Reinforcement Learning , 2018, ICLR.

[631] Tong Lu,et al. On Reinforcement Learning for Full-length Game of StarCraft , 2018, AAAI.

[632] Kouichi Sakurai,et al. One Pixel Attack for Fooling Deep Neural Networks , 2017, IEEE Transactions on Evolutionary Computation.

[633] Mike Preuss,et al. Alternative Loss Functions in AlphaZero-like Self-play , 2019, 2019 IEEE Symposium Series on Computational Intelligence (SSCI).

[634] Jakub W. Pachocki,et al. Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.

[635] Amos J. Storkey,et al. How to train your MAML , 2018, ICLR.

[636] Heike Trautmann,et al. Automated Algorithm Selection: Survey and Perspectives , 2018, Evolutionary Computation.

[637] Kenneth O. Stanley,et al. Go-Explore: a New Approach for Hard-Exploration Problems , 2019, ArXiv.

[638] Tim Salimans,et al. Policy Gradient Search: Online Planning and Expert Iteration without Search Trees , 2019, ArXiv.

[639] Elliot Meyerson,et al. Evolving Deep Neural Networks , 2017, Artificial Intelligence in the Age of Neural Networks and Brain Computing.

[640] Guy Lever,et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning , 2018, Science.

[641] Ryan B. Hayward,et al. Hex: The Full Story , 2019 .

[642] Xin Yang,et al. Exposing Deep Fakes Using Inconsistent Head Poses , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[643] David J. Wu,et al. Accelerating Self-Play Learning in Go , 2019, ArXiv.

[644] Yuxi Li,et al. Deep Reinforcement Learning , 2018, Reinforcement Learning for Cyber-Physical Systems.

[645] Dennis J. N. J. Soemers,et al. Strategic Features for General Games , 2019, KEG@AAAI.

[646] Ruben Villegas,et al. Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[647] Sergey Levine,et al. Model-Based Reinforcement Learning for Atari , 2019, ICLR.

[648] Demis Hassabis,et al. Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.

[649] Tor Lattimore,et al. Behaviour Suite for Reinforcement Learning , 2019, ICLR.

[650] Junhyuk Oh,et al. Discovering Reinforcement Learning Algorithms , 2020, NeurIPS.

[651] H. Sangani,et al. DO ANDROIDS DREAM OF ELECTRIC SHEEP? , 2020, Faculty Brat.

[652] Sebastian Risi,et al. From Chess and Atari to StarCraft and Beyond: How Game AI is Driving the World of AI , 2020, KI - Künstliche Intelligenz.

[653] Zeb Kurth-Nelson,et al. A distributional code for value in dopamine-based reinforcement learning , 2020, Nature.

[654] Yoram Bachrach,et al. Learning to Play No-Press Diplomacy with Best Response Policy Iteration , 2020, NeurIPS.

[655] Mike Preuss,et al. Model-Based Deep Reinforcement Learning for High-Dimensional Problems, a Survey , 2020, ArXiv.

[656] Matthew E. Taylor,et al. Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey , 2020, J. Mach. Learn. Res..

[657] Hui Wang,et al. Analysis of Hyper-Parameters for Small Games: Iterations or Epochs in Self-Play? , 2020, ArXiv.

[658] John Schulman,et al. Teacher–Student Curriculum Learning , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[659] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .