Reinforcement learning with adaptive Kanerva coding for Xpilot game AI

The Xpilot-AI video game platform allows the creation of artificially intelligent and autonomous control agents. At the same time, the Xpilot environment is highly complex, with very many state variables and action choices. Basic reinforcement learning (RL) techniques are somewhat limited in their application when dealing with such large state- and action-spaces, since the repetition of exposure that is key to their value updates can proceed very slowly. To solve this problem, state-abstractions are often generated, allowing learning to move more quickly, but often requiring the programmer to hand-craft state representations, reward functions, and action choices in an ad hoc manner. We apply an automated technique for generating useful abstractions for learning, adaptive Kanerva coding. This method employs a small sub-set of the original states as a proxy for the full environment, updating values over the abstract representative prototype states in a manner analogous to Q-learning. Over time, the set of prototypes is adjusted to provide more effective coverage and abstraction, again automatically. Our results show that this technique allows a simple learning agent to double its survival time when navigating the Xpilot environment, using only a small fraction of the full state-space as a stand-in and greatly increasing the potential for more rapid learning.

[1]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[2]  Martin Allen,et al.  Real-time ai in xpilot using reinforcement learning , 2010, 2010 World Automation Congress.

[3]  Doina Precup,et al.  Sparse Distributed Memories for On-Line Value-Based Reinforcement Learning , 2004, ECML.

[4]  Gary B. Parker,et al.  Evolving Parameters for Xpilot Combat Agents , 2007, 2007 IEEE Symposium on Computational Intelligence and Games.

[5]  Cheng Wu,et al.  Adaptive Fuzzy Function Approximation for Multi-agent Reinforcement Learning , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.

[6]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[7]  Andrew Hubley,et al.  Using a fuzzy logic control system for an Xpilot combat agent , 2010, 2010 World Automation Congress.

[8]  Martin Allen,et al.  Agent Influence and Intelligent Approximation in Multiagent Problems , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.

[9]  Richard S. Sutton,et al.  Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .

[10]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[11]  Cheng Wu,et al.  Fuzzy Kanerva-based function approximation for reinforcement learning , 2009, AAMAS.

[12]  Huosheng Hu,et al.  KaBaGe-RL: Kanerva-based generalisation and reinforcement learning for possession football , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[13]  Risto Miikkulainen,et al.  Real-Time Evolution of Neural Networks in the NERO Video Game , 2006, AAAI.

[14]  Gary B. Parker,et al.  The Xpilot-AI environment , 2010, 2010 World Automation Congress.

[15]  Cheng Wu,et al.  Adaptive Kanerva-based function approximation for multi-agent systems , 2008, AAMAS.

[16]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[17]  Simon Colton,et al.  Combining AI Methods for Learning Bots in a Real-Time Strategy Game , 2009, Int. J. Comput. Games Technol..

[18]  Pentti Kanerva,et al.  Sparse Distributed Memory , 1988 .

[19]  Ashwin Ram,et al.  Transfer Learning in Real-Time Strategy Games Using Hybrid CBR/RL , 2007, IJCAI.

[20]  Gary B. Parker,et al.  Using evolution strategies for the real-time learning of controllers for autonomous agents in Xpilot-AI , 2010, IEEE Congress on Evolutionary Computation.