Adaptive Kanerva-based function approximation for multi-agent systems

In this paper, we show how adaptive prototype optimization can be used to improve the performance of function approximation based on Kanerva Coding when solving largescale instances of classic multi-agent problems. We apply our techniques to the predator-prey pursuit problem. We first demonstrate that Kanerva Coding applied within a reinforcement learner does not give good results. We then describe our new adaptive Kanerva-based function approximation algorithm, based on prototype deletion and generation. We show that probabilistic prototype deletion with random prototype generation increases the fraction of test instances that are solved from 45% to 90%, and that prototype splitting increases that fraction to 94%. We also show that optimizing prototypes reduces the number of prototypes, and therefore the number of features, needed to achieve a 90% solution rate by up to 87%. These results demonstrate that our approach can dramatically improve the quality of the results obtained and reduce the number of prototypes required. We conclude that adaptive prototype optimization can greatly improve a Kanerva-based reinforcement learner's ability to solve large-scale multi-agent problems.

[1]  Micah Adler,et al.  Randomized Pursuit-Evasion in Graphs , 2002, Combinatorics, Probability and Computing.

[2]  Geoffrey E. Hinton,et al.  Distributed Representations , 1986, The Philosophy of Artificial Intelligence.

[3]  Pentti Kanerva,et al.  Sparse Distributed Memory , 1988 .

[4]  Shigenobu Kobayashi,et al.  Reinforcement Learning in POMDPs with Function Approximation , 1997, ICML.

[5]  Richard S. Sutton,et al.  Reinforcement Learning , 1992, Handbook of Machine Learning.

[6]  Huosheng Hu,et al.  KaBaGe-RL: Kanerva-based generalisation and reinforcement learning for possession football , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[7]  Doina Precup,et al.  Sparse Distributed Memories for On-Line Value-Based Reinforcement Learning , 2004, ECML.

[8]  Ming Tan,et al.  Multi-Agent Reinforcement Learning: Independent versus Cooperative Agents , 1997, ICML.

[9]  Sampath Kannan,et al.  Randomized Pursuit-Evasion with Local Visibility , 2006, SIAM J. Discret. Math..

[10]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[11]  Sandip Sen,et al.  The Evolution of Multiagent Coordination Strategies , 1997 .

[12]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[13]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[15]  M. Benda,et al.  On Optimal Cooperation of Knowledge Sources , 1985 .