论文信息 - Learning to Play Pac-Xon with Q-Learning and Two Double Q-Learning Variants

Learning to Play Pac-Xon with Q-Learning and Two Double Q-Learning Variants

Pac-Xon is an arcade video game in which the player tries to fill a level space by conquering blocks while being threatened by enemies. In this paper it is investigated whether a reinforcement learning (RL) agent can successfully learn to play this game. The RL agent consists of a multilayer perceptron (MLP) that uses a feature representation of the game state through input variables and gives Q-values for each possible action as output. For training the agent, the use of Q-learning is compared to two double Q-learning variants, the original algorithm and a novel variant. Furthermore, we have set up an alternative reward function which presents higher rewards towards the end of a level to try to increase the performance of the algorithms. The results show that all algorithms can be used to successfully learn to play Pac-Xon. Furthermore both double Q-learning variants obtain significantly higher performances than Q-learning and the progressive reward function does not yield better results than the regular reward function.

[1] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[2] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[3] Marco Wiering,et al. Exploration Methods for Connectionist Q-learning in Bomberman , 2018, ICAART.

[4] Peter Vrancx,et al. Reinforcement Learning: State-of-the-Art , 2012 .

[5] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[6] R. Bellman. A Markovian Decision Process , 1957 .

[7] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.

[8] Marcus Gallagher,et al. Evolving Pac-Man Players: Can We Learn from Raw Input? , 2007, 2007 IEEE Symposium on Computational Intelligence and Games.

[9] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[10] James A. Anderson,et al. Neurocomputing: Foundations of Research , 1988 .

[11] Marco Wiering,et al. Reinforcement learning to train Ms. Pac-Man using higher-order action-relative inputs , 2013, 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[12] Pierre Priouret,et al. Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[13] John E. Laird,et al. Human-Level AI's Killer Application: Interactive Computer Games , 2000, AI Mag..

[14] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[15] Marco Wiering,et al. Opponent Modelling in the Game of Tron using Reinforcement Learning , 2018, ICAART.