论文信息 - Computer Games

Computer Games

The asynchronous nature of the state-of-the-art reinforcement learning algorithms such as the Asynchronous Advantage ActorCritic algorithm, makes them exceptionally suitable for CPU computations. However, given the fact that deep reinforcement learning often deals with interpreting visual information, a large part of the train and inference time is spent performing convolutions. In this work we present our results on learning strategies in Atari games using a Convolutional Neural Network, the Math Kernel Library and TensorFlow framework. We also analyze effects of asynchronous computations on the convergence of reinforcement learning algorithms.

Mark H. M. Winands | Tristan Cazenave | Abdallah Saffidine

[1] Carmel Domshlak,et al. On Combinatorial Actions and CMABs with Linear Side Information , 2014, ECAI.

[2] Risto Miikkulainen,et al. General Video Game Playing , 2013, Artificial and Computational Intelligence in Games.

[3] Rémi Coulom,et al. CLOP: Confident Local Optimization for Noisy Black-Box Parameter Tuning , 2011, ACG.

[4] Csaba Szepesvári,et al. RSPSA: Enhanced Parameter Optimization in Games , 2006, ACG.

[5] Mark H. M. Winands,et al. N-Grams and the Last-Good-Reply Policy Applied in General Game Playing , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[6] Bernd Brügmann Max-Planck. Monte Carlo Go , 1993 .

[7] Santiago Ontañón,et al. Combinatorial Multi-armed Bandits for Real-Time Strategy Games , 2017, J. Artif. Intell. Res..

[8] Bruno Bouzy,et al. Monte-Carlo Go Developments , 2003, ACG.

[9] Yngvi Björnsson,et al. Learning Simulation Control in General Game-Playing Agents , 2010, AAAI.

[10] Moshe Sipper,et al. EvoMCTS: A Scalable Approach for General Game Learning , 2014, IEEE Transactions on Computational Intelligence and AI in Games.

[11] Tristan Cazenave,et al. Generalized Rapid Action Value Estimation , 2015, IJCAI.

[12] Johannes Fürnkranz,et al. Recent Advances in Machine Learning and Game Playing , 2007 .

[13] Rémi Coulom,et al. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[14] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.

[15] Mark H. M. Winands,et al. Enhancements for Multi-Player Monte-Carlo Tree Search , 2010, Computers and Games.

[16] Chiara F. Sironi,et al. Comparison of rapid action value estimation variants for general game playing , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[17] H. Jaap van den Herik,et al. Progressive Strategies for Monte-Carlo Tree Search , 2008 .

[18] Simon M. Lucas,et al. The N-Tuple bandit evolutionary algorithm for automatic game improvement , 2017, 2017 IEEE Congress on Evolutionary Computation (CEC).

[19] Yngvi Björnsson,et al. CadiaPlayer: A Simulation-Based General Game Player , 2009, IEEE Transactions on Computational Intelligence and AI in Games.

[20] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[21] T. Anthony Marsland,et al. Learning extension parameters in game-tree search , 2003, Inf. Sci..

[22] Sushil J. Louis,et al. Using a genetic algorithm to tune first-person shooter bots , 2004, Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753).

[23] H. Jaap van den Herik,et al. Cross-Entropy for Monte-Carlo Tree Search , 2008, J. Int. Comput. Games Assoc..

[24] Simon M. Lucas,et al. Fast Evolutionary Adaptation for Monte Carlo Tree Search , 2014, EvoApplications.

[25] Stephan Schiffel,et al. Creating Action Heuristics for General Game Playing Agents , 2015, CGW/GIGA@IJCAI.

[26] Julian Togelius,et al. Hyper-heuristic general video game playing , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[27] David Silver,et al. Combining online and offline knowledge in UCT , 2007, ICML '07.

[28] Maciej Swiechowski,et al. Self-Adaptation of Playing Strategies in General Game Playing , 2014, IEEE Transactions on Computational Intelligence and AI in Games.

[29] Simon M. Lucas,et al. Knowledge-based fast evolutionary MCTS for general video game playing , 2014, 2014 IEEE Conference on Computational Intelligence and Games.

[30] Michel Gendreau,et al. Hyper-heuristics: a survey of the state of the art , 2013, J. Oper. Res. Soc..

[31] Oren Somekh,et al. Almost Optimal Exploration in Multi-Armed Bandits , 2013, ICML.

[32] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[33] Yngvi Björnsson,et al. Simulation-Based Approach to General Game Playing , 2008, AAAI.

[34] Santiago Ontañón,et al. The Combinatorial Multi-Armed Bandit Problem and Its Application to Real-Time Strategy Games , 2013, AIIDE.