Learning to Play Pac-Xon with Q-Learning and Two Double Q-Learning Variants

Pac-Xon is an arcade video game in which the player tries to fill a level space by conquering blocks while being threatened by enemies. In this paper it is investigated whether a reinforcement learning (RL) agent can successfully learn to play this game. The RL agent consists of a multilayer perceptron (MLP) that uses a feature representation of the game state through input variables and gives Q-values for each possible action as output. For training the agent, the use of Q-learning is compared to two double Q-learning variants, the original algorithm and a novel variant. Furthermore, we have set up an alternative reward function which presents higher rewards towards the end of a level to try to increase the performance of the algorithms. The results show that all algorithms can be used to successfully learn to play Pac-Xon. Furthermore both double Q-learning variants obtain significantly higher performances than Q-learning and the progressive reward function does not yield better results than the regular reward function.

[1]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[2]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[3]  Marco Wiering,et al.  Exploration Methods for Connectionist Q-learning in Bomberman , 2018, ICAART.

[4]  Peter Vrancx,et al.  Reinforcement Learning: State-of-the-Art , 2012 .

[5]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[6]  R. Bellman A Markovian Decision Process , 1957 .

[7]  Hado van Hasselt,et al.  Double Q-learning , 2010, NIPS.

[8]  Marcus Gallagher,et al.  Evolving Pac-Man Players: Can We Learn from Raw Input? , 2007, 2007 IEEE Symposium on Computational Intelligence and Games.

[9]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[10]  James A. Anderson,et al.  Neurocomputing: Foundations of Research , 1988 .

[11]  Marco Wiering,et al.  Reinforcement learning to train Ms. Pac-Man using higher-order action-relative inputs , 2013, 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[12]  Pierre Priouret,et al.  Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[13]  John E. Laird,et al.  Human-Level AI's Killer Application: Interactive Computer Games , 2000, AI Mag..

[14]  Gerald Tesauro,et al.  Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[15]  Marco Wiering,et al.  Opponent Modelling in the Game of Tron using Reinforcement Learning , 2018, ICAART.