Automatic Generation of a Sub-optimal Agent Population with Learning

Most modern solutions for video game balancing are directed towards specific games. We are currently researching general methods for automatic multiplayer game balancing. The problem is modeled as a meta-game, where game-play change the rules from another game. This way, a Machine Learning agent that learns to play a meta-game, learns how to change a base game following some balancing metric. But an issue resides in the generation of high volume of game-play training data, was agents of different skill compete against each other. For this end we propose the automatic generation of a population of surrogate agents by learning sampling. In Reinforcement Learning an agent learns in a trial error fashion where it improves gradually its policy, the mapping from world state to action to perform. This means that in each successful evolutionary step an agent follows a sub-optimal strategy, or eventually the optimal strategy. We store the agent policy at the end of each training episode. The process is evaluated in simple environments with distinct properties. Quality of the generated population is evaluated by the diversity of the difficulty the agents have in solving their tasks.

[1]  Gazihan Alankus,et al.  Towards customizable games for stroke rehabilitation , 2010, CHI.

[2]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[3]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[4]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[5]  Yuxi Li,et al.  Deep Reinforcement Learning: An Overview , 2017, ArXiv.

[6]  Guy Lever,et al.  Human-level performance in 3D multiplayer games with population-based reinforcement learning , 2018, Science.

[7]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[8]  Demis Hassabis,et al.  Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[9]  Sofie Notelaers,et al.  Game-based collaborative training for arm rehabilitation of MS patients : a proof-of-concept game , 2010 .

[10]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[11]  Peta Wyeth,et al.  A framework of Dynamic Difficulty Adjustment in competitive multiplayer video games , 2013, 2013 IEEE International Games Innovation Conference (IGIC).

[12]  Hiroyuki Iida,et al.  Fairness mechanism in multiplayer online battle arena games , 2016, 2016 3rd International Conference on Systems and Informatics (ICSAI).

[13]  Ole-Christoffer Granmo,et al.  Deep RTS: A Game Environment for Deep Reinforcement Learning in Real-Time Strategy Games , 2018, 2018 IEEE Conference on Computational Intelligence and Games (CIG).

[14]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[15]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[16]  Darryl Charles,et al.  Augmented Reality Games for Upper-Limb Stroke Rehabilitation , 2009, 2010 Second International Conference on Games and Virtual Worlds for Serious Applications.

[17]  Sérgio Oliveira,et al.  Adaptive content generation for games , 2017, 2017 24º Encontro Português de Computação Gráfica e Interação (EPCGI).