Reinforcement learning for qualitative group behaviours applied to non-player computer game characters

This thesis investigates how to train the increasingly large cast of characters in modern commercial computer games. Modern computer games can contain hundreds or sometimes thousands of non-player characters that each should act coherently in complex dynamic worlds, and engage appropriately with other non-player characters and human players. Too often, it is obvious that computer controlled characters are brainless zombies portraying the same repetitive hand-coded behaviour. Commercial computer games would seem a natural domain for reinforcement learning and, as the trend for selling games based on better graphics is peaking with the saturation of game shelves with excellent graphics, it seems that better artificial intelligence is the next big thing. The main contribution of this thesis is a novel style of utility function, group utility functions, for reinforcement learning that could provide automated behaviour specification for large numbers of computer game characters. Group utility functions allow arbitrary functions of the characters’ performance to represent relationships between characters and groups of characters. These qualitative relationships are learned alongside the main quantitative goal of the characters. Group utility functions can be considered a multi-agent extension of the existing programming by reward method and, an extension of the team utility function to be more generic by replacing the sum function with potentially any other function. Hierarchical group utility functions, which are group utility functions arranged in a tree structure, allow character group relationships to be learned. For illustration, the empirical work shown uses the negative standard deviation function to create balanced (or equal performance) behaviours. This balanced behaviour can be learned between characters, groups and also, between groups and single characters. Empirical experiments show that a balancing group utility function can be used to engender an equal performance between characters, groups, and groups and single characters. It is shown that it is possible to trade some amount of quantitatively measured performance for some qualitative behaviour using group utility functions. Further experiments show how the results degrade as expected when the number of characters and groups is increased. Further experimentation shows that using function approximation to approximate the learners’ value functions is one possible way to overcome the issues of scale. All the experiments are undertaken in a commercially available computer game engine. In summary, this thesis contributes a novel type of utility function potentially suitable for training many computer game characters and, empirical work on reinforcement learning used in a modern computer game engine.

[1]  Brian Magerko A Proposal for an Interactive Drama Architecture , 2002 .

[2]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[3]  Johanna Bragge,et al.  Profiling Academic Research on Digital Games Using Text Mining Tools , 2007, DiGRA Conference.

[4]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[5]  Kagan Tumer,et al.  Reinforcement Learning in Distributed Domains: Beyond Team Games , 2001, IJCAI.

[6]  Manuela M. Veloso,et al.  Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[7]  Steve Rabin,et al.  AI Game Programming Wisdom , 2002 .

[8]  Pat Langley,et al.  Controlling physical agents through reactive logic programming , 1999, AGENTS '99.

[9]  Simon M. Lucas,et al.  Computational Intelligence and AI in Games: A New IEEE Transactions , 2009, IEEE Transactions on Computational Intelligence and AI in Games.

[10]  John E. Laird,et al.  Human-Level AI's Killer Application: Interactive Computer Games , 2000, AI Mag..

[11]  Mehmet Gonullu,et al.  Department of Computer Science and Engineering , 2011 .

[12]  Kagan Tumer,et al.  Adaptivity in agent-based routing for data networks , 1999, AGENTS '00.

[13]  Kagan Tumer,et al.  An Introduction to Collective Intelligence , 1999, ArXiv.

[14]  Brian Mac Namee,et al.  Research Directions for AI in Computer Games , 2001 .

[16]  Kagan Tumer,et al.  Collective Intelligence, Data Routing and Braess' Paradox , 2002, J. Artif. Intell. Res..

[17]  Georgios N. Yannakakis AI in computer games : generating interesting interactive opponents by the use of evolutionary computation , 2005 .

[18]  Jörg Denzinger,et al.  Evolutionary behavior testing of commercial computer games , 2004, Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753).

[19]  Gillian Hayes,et al.  Adapting Reinforcement Learning for Computer Games: Using Group Utility Functions , 2005, CIG.

[20]  George Dimitri Konidaris,et al.  An Architecture for Behavior-Based Reinforcement Learning , 2005, Adapt. Behav..

[21]  Aliza Gold Academic AI and Videogames: A Case Study of Incorporating Innovative Academic Research Into a Videogame Prototype , 2005, CIG.

[22]  Bruce Blumberg,et al.  A Layered Brain Architecture for Synthetic Creatures , 2001, IJCAI.

[23]  Peter Stone,et al.  Scaling Reinforcement Learning toward RoboCup Soccer , 2001, ICML.

[24]  M. Gribaudo,et al.  2002 , 2001, Cell and Tissue Research.

[25]  Pat Langley,et al.  An architecture for persistent reactive behavior , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[26]  Ross D. Shachter,et al.  Using background knowledge to speed reinforcement learning in physical agents , 2001, AGENTS '01.

[27]  Murray Campbell,et al.  Deep Blue , 2002, Artif. Intell..

[28]  David H. Wolpert,et al.  Designing agent collectives for systems with markovian dynamics , 2002, AAMAS '02.

[29]  Kagan Tumer,et al.  Collective Intelligence and Braess' Paradox , 2000, AAAI/IAAI.

[30]  Michael Thornton Wyman,et al.  2 – World of Warcraft , 2011 .

[31]  Sander M. Bohte,et al.  COllective INtelligence with Sequences of Actions - Coordinating Actions in Multi-agent Systems , 2003, ECML.

[32]  Markus Gross,et al.  Towards a game agent , 2003 .

[33]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[34]  Simon M. Lucas,et al.  Evolution versus Temporal Difference Learning for learning to play Ms. Pac-Man , 2009, 2009 IEEE Symposium on Computational Intelligence and Games.

[35]  Jörg Denzinger,et al.  Combining Coaching and Learning to Create Cooperative Character Behavior , 2005, CIG.

[36]  A. Pentland,et al.  Collective intelligence , 2006, IEEE Comput. Intell. Mag..

[37]  L. Miles,et al.  2000 , 2000, RDH.

[38]  Jeff Orkin,et al.  Applying Goal-Oriented Action Planning to Games , 2008 .

[39]  Kagan Tumer,et al.  Collective Intelligence for Control of Distributed Dynamical Systems , 1999, ArXiv.

[40]  L. Rabiner,et al.  An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.

[41]  Tom Lenaerts,et al.  Learning agents in a homo egualis society , 2001 .

[42]  Kenneth O. Stanley and Bobby D. Bryant and Risto Miikkulainen,et al.  Real-Time Evolution in the NERO Video Game (Winner of CIG 2005 Best Paper Award) , 2005, CIG.

[43]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[44]  Jeff Orkin,et al.  Symbolic Representation of Game World State: Toward Real-Time Planning in Games , 2004 .

[45]  Kagan Tumer,et al.  Optimal Payoff Functions for Members of Collectives , 2001, Adv. Complex Syst..

[46]  Pat Langley,et al.  Hierarchical Skills and Cognitive Architectures , 2004 .

[47]  Gillian Hayes,et al.  Group utility functions: learning equilibria between groups of agents in computer games by modifying the reinforcement signal , 2005, 2005 IEEE Congress on Evolutionary Computation.

[48]  Nick Hawes Real Time Goal Orientated Behaviour for Computer Game Agents , 2000, GAME-ON.

[49]  Manuela M. Veloso,et al.  Simultaneous Adversarial Multi-Robot Learning , 2003, IJCAI.

[50]  Pat Langley,et al.  Separating Skills from Preference: Using Learning to Program by Reward , 2002, ICML.