Multi-agent Task Division Learning in Hide-and-Seek Games

This paper discusses the problem of territory division in Hide-and-Seek games. To obtain an efficient seeking performance for multiple seekers, the seekers should agree on searching their own territories and learn to visit good hiding places first so that the expected time to find the hider is minimized. We propose a learning model using Reinforcement Learning in a hierarchical learning structure. Elemental tasks of planning the path to each hiding place are learnt in the lower layer, and then the composite task of finding the optimal sequence is learnt in the higher layer. The proposed approach is examined on a set of different maps and resulted in convergece to the optimal solution.

[1]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[2]  Tommi S. Jaakkola,et al.  Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.

[3]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[4]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[5]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[6]  Noa Agmon,et al.  Constructing spanning trees for efficient multi-robot coverage , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[7]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[8]  Satinder Singh Transfer of Learning by Composing Solutions of Elemental Sequential Tasks , 1992, Mach. Learn..

[9]  M. Gunady,et al.  Reinforcement learning generalization using state aggregation with a maze-solving problem , 2012, 2012 Japan-Egypt Conference on Electronics, Communications and Computers.

[10]  Howie Choset,et al.  Coverage for robotics – A survey of recent results , 2001, Annals of Mathematics and Artificial Intelligence.

[11]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[12]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[13]  Satinder P. Singh,et al.  The Efficient Learning of Multiple Task Sequences , 1991, NIPS.

[14]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .