Opponent Modelling in the Game of Tron using Reinforcement Learning

In this paper we propose the use of vision grids as state representation to learn to play the game Tron using neural networks and reinforcement learning. This approach speeds up learning by significantly reducing the number of unique states. Furthermore, we introduce a novel opponent modelling technique, which is used to predict the opponent’s next move. The learned model of the opponent is subsequently used in Monte-Carlo roll-outs, in which the game is simulated n-steps ahead in order to determine the expected value of conducting a certain action. Finally, we compare the performance using two different activation functions in the multi-layer perceptron, namely the sigmoid and exponential linear unit (Elu). The results show that the Elu activation function outperforms the sigmoid activation function in most cases. Furthermore, vision grids significantly increase learning speed and in most cases this also increases the agent’s performance compared to when the full grid is used as state representation. Finally, the opponent modelling technique allows the agent to learn a predictive model of the opponent’s actions, which in combination with Monte-Carlo roll-outs significantly increases the agent’s performance.

[1]  Bruno Bouzy,et al.  Monte-Carlo Go Developments , 2003, ACG.

[2]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[3]  Michael H. Bowling,et al.  Bayes' Bluff: Opponent Modelling in Poker , 2005, UAI 2005.

[4]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[5]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[6]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[7]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[8]  Jordan L. Boyd-Graber,et al.  Opponent Modeling in Deep Reinforcement Learning , 2016, ICML.

[9]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[10]  Brian Sheppard,et al.  World-championship-caliber Scrabble , 2002, Artif. Intell..

[11]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[12]  Marco Wiering,et al.  Connectionist reinforcement learning for intelligent unit micro management in StarCraft , 2011, The 2011 International Joint Conference on Neural Networks.

[13]  Tuomas Sandholm,et al.  Game theory-based opponent modeling in large imperfect-information games , 2011, AAMAS.

[14]  Gerald Tesauro,et al.  On-line Policy Improvement using Monte-Carlo Search , 1996, NIPS.

[15]  Tom Schaul,et al.  The Predictron: End-To-End Learning and Planning , 2016, ICML.

[16]  R. Bellman A Markovian Decision Process , 1957 .

[17]  Marco Wiering,et al.  Reinforcement learning to train Ms. Pac-Man using higher-order action-relative inputs , 2013, 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[18]  Richard Andrew Mealing,et al.  Dynamic opponent modelling in two-player games , 2015 .