Deep Reinforcement Learning in Strategic Board Game Environments

In this paper we propose a novel Deep Reinforcement Learning (DRL) algorithm that uses the concept of “action-dependent state features”, and exploits it to approximate the Q-values locally, employing a deep neural network with parallel Long Short Term Memory (LSTM) components, each one responsible for computing an action-related Q-value. As such, all computations occur simultaneously, and there is no need to employ “target” networks and experience replay, which are techniques regularly used in the DRL literature. Moreover, our algorithm does not require previous training experiences, but trains itself online during game play. We tested our approach in the Settlers Of Catan multi-player strategic board game. Our results confirm the effectiveness of our approach, since it outperforms several competitors, including the state-of-the-art jSettler heuristic algorithm devised for this particular domain.

[1]  Pieter Spronck,et al.  Monte-Carlo Tree Search in Settlers of Catan , 2009, ACG.

[2]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[3]  Oliver Lemon,et al.  Strategic Dialogue Management via Deep Reinforcement Learning , 2015, NIPS 2015.

[4]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[5]  Stuart J. Russell,et al.  Bayesian Q-Learning , 1998, AAAI/IAAI.

[6]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[7]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[8]  Kris Hammond,et al.  Real-time decision making for adversarial environments using a plan-based heuristic , 2003 .

[9]  Nahum Shimkin,et al.  Deep Reinforcement Learning with Averaged Target DQN , 2016, ArXiv.

[10]  Matthew Lai,et al.  Giraffe: Using Deep Reinforcement Learning to Play Chess , 2015, ArXiv.

[11]  Stuart J. Russell,et al.  Q-Decomposition for Reinforcement Learning Agents , 2003, ICML.

[12]  Oliver Lemon,et al.  Evaluating Persuasion Strategies and Deep Reinforcement Learning methods for Negotiation Dialogue agents , 2017, EACL.

[13]  Benjamin Van Roy,et al.  Deep Exploration via Bootstrapped DQN , 2016, NIPS.

[14]  Peter Stone,et al.  Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[15]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[16]  Peter Finnman,et al.  Deep reinforcement learning compared with Q-table learning applied to backgammon , 2016 .

[17]  Michael Pfeiffer,et al.  Reinforcement Learning of Strategies for Settlers of Catan , 2004 .

[18]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[19]  Manuela M. Veloso,et al.  Team-partitioned, opaque-transition reinforcement learning , 1999, AGENTS '99.

[20]  Alex Lascarides,et al.  Online learning and mining human play in complex games , 2015, 2015 IEEE Conference on Computational Intelligence and Games (CIG).

[21]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[22]  Alex Lascarides,et al.  Game strategies for The Settlers of Catan , 2014, 2014 IEEE Conference on Computational Intelligence and Games.

[23]  Nicholas Asher,et al.  Discourse parsing for multi-party chat dialogues , 2015, EMNLP.

[24]  Nikos A. Vlassis,et al.  Collaborative Multiagent Reinforcement Learning by Payoff Propagation , 2006, J. Mach. Learn. Res..

[25]  Honglak Lee,et al.  Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[26]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.