Strategies for Using Proximal Policy Optimization in Mobile Puzzle Games

While traditionally a labour intensive task, the testing of game content is progressively becoming more automated. Among the many directions in which this automation is taking shape, automatic play-testing is one of the most promising thanks also to advancements of many supervised and reinforcement learning (RL) algorithms. However these type of algorithms, while extremely powerful, often suffer in production environments due to issues with reliability and transparency in their training and usage. In this research work we are investigating and evaluating strategies to apply the popular RL method Proximal Policy Optimization (PPO) in a casual mobile puzzle game with a specific focus on improving its reliability in training and generalization during game playing. We have implemented and tested a number of different strategies against a real-world mobile puzzle game (Lily’s Garden from Tactile Games). We isolated the conditions that lead to a failure in either training or generalization during testing and we identified a few strategies to ensure a more stable behaviour of the algorithm in this game genre.

[1]  Yuval Tassa,et al.  Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.

[2]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[3]  Joelle Pineau,et al.  Decoupling Dynamics and Reward for Transfer Learning , 2018, ICLR.

[4]  Taehoon Kim,et al.  Quantifying Generalization in Reinforcement Learning , 2018, ICML.

[5]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[6]  Stefan Freyr Gudmundsson,et al.  Human-Like Playtesting with Deep Learning , 2018, 2018 IEEE Conference on Computational Intelligence and Games (CIG).

[7]  Dawn Xiaodong Song,et al.  Assessing Generalization in Deep Reinforcement Learning , 2018, ArXiv.

[8]  Ufuk Topcu,et al.  Safe Reinforcement Learning via Shielding , 2017, AAAI.

[9]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[10]  Julian Togelius,et al.  Deep Learning for Video Game Playing , 2017, IEEE Transactions on Games.

[11]  Peng Zhang,et al.  Using Restart Heuristics to Improve Agent Performance in Angry Birds , 2019, 2019 IEEE Conference on Games (CoG).

[12]  Olivier Pietquin,et al.  "I’m Sorry Dave, I’m Afraid I Can’t Do That" Deep Q-Learning from Forbidden Actions , 2020, 2020 International Joint Conference on Neural Networks (IJCNN).

[13]  Nicolas Le Roux,et al.  Understanding the impact of entropy on policy optimization , 2018, ICML.

[14]  OctoMiao Overcoming catastrophic forgetting in neural networks , 2016 .

[15]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[16]  Julian Togelius,et al.  Illuminating Generalization in Deep Reinforcement Learning through Procedural Level Generation , 2018, 1806.10729.

[17]  Petros Christodoulou,et al.  Soft Actor-Critic for Discrete Action Settings , 2019, ArXiv.

[18]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[19]  Marlos C. Machado,et al.  Generalization and Regularization in DQN , 2018, ArXiv.

[20]  Alex Graves,et al.  Automated Curriculum Learning for Neural Networks , 2017, ICML.

[21]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[22]  Tom Schaul,et al.  Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[23]  Shie Mannor,et al.  Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning , 2018, NeurIPS.

[24]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[25]  Peter Reynolds,et al.  I’m Sorry Dave, I’m Afraid I Can’t do That , 2002 .

[26]  Jaakko Lehtinen,et al.  PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation , 2018, 2020 IEEE 30th International Workshop on Machine Learning for Signal Processing (MLSP).

[27]  Larry Rudolph,et al.  Are Deep Policy Gradient Algorithms Truly Policy Gradient Algorithms? , 2018, ArXiv.

[28]  Marwan Mattar,et al.  Unity: A General Platform for Intelligent Agents , 2018, ArXiv.

[29]  Demis Hassabis,et al.  Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.

[30]  Julian Togelius,et al.  Automated Playtesting of Matching Tile Games , 2019, 2019 IEEE Conference on Games (CoG).

[31]  Ildar Kamaldinov,et al.  Deep Reinforcement Learning in Match-3 Game , 2019, 2019 IEEE Conference on Games (CoG).

[32]  Yarin Gal,et al.  Generalizing from a few environments in safety-critical reinforcement learning , 2019, ArXiv.

[33]  Mohsen Sardari,et al.  Winning Isn't Everything: Training Human-Like Agents for Playtesting and Game AI , 2019, ArXiv.