Aggregate Reinforcement Learning for multi-agent territory division: The Hide-and-Seek game

In many applications in Robotics such as disaster rescuing, mine detection, robotic surveillance and warehouse systems, it is crucial to build multi-agent systems (MAS) in which agents cooperate to complete a sequence of tasks. For better performance in such systems, e.g. minimizing duplicate work, agents need to agree on how to divide and plan that sequence of tasks among themselves. This paper targets the problem of territory division in the children's game of Hide-and-Seek as a test-bed for our proposed approach. The problem is solved in a hierarchical learning scheme using Reinforcement Learning (RL). Based on Q-learning, our learning model is presented in detail; definition of composite states, actions, and reward function to deal with multiple agent learning. In addition, a revised version of the standard updating rule of the Q-learning is proposed to cope with multiple seekers. The model is examined on a set of different maps, on which it converges to the optimal solutions. After the complexity analysis of the algorithm, we enhanced it by using state aggregation (SA) to alleviate the state space explosion. Two levels of aggregation are devised: topological and hiding aggregation. After elaborating how the learning model is modified to handle the aggregation technique, the enhanced model is examined by some experiments. Results indicate promising performance with higher convergence rate and up to 10i? space reduction.

[1]  Jan Karel Lenstra,et al.  Some Simple Applications of the Travelling Salesman Problem , 1975 .

[2]  N. Biggs THE TRAVELING SALESMAN PROBLEM A Guided Tour of Combinatorial Optimization , 1986 .

[3]  Bhaskara Marthi,et al.  Automatic shaping and decomposition of reward functions , 2007, ICML '07.

[4]  A. Schrijver On the History of Combinatorial Optimization (Till 1960) , 2005 .

[5]  Walid Gomaa,et al.  Multi-agent Task Division Learning in Hide-and-Seek Games , 2012, AIMSA.

[6]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[7]  Lixin Tang,et al.  A multiple traveling salesman problem model for hot rolling scheduling in Shanghai Baoshan Iron & Steel Complex , 2000, Eur. J. Oper. Res..

[8]  Satinder Singh Transfer of Learning by Composing Solutions of Elemental Sequential Tasks , 1992, Mach. Learn..

[9]  M. Held,et al.  A dynamic programming approach to sequencing problems , 1962, ACM National Meeting.

[10]  Lakhmi C. Jain,et al.  Convergence Analysis on Approximate Reinforcement Learning , 2007, KSEM.

[11]  David S. Johnson,et al.  The Traveling Salesman Problem: A Case Study in Local Optimization , 2008 .

[12]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[13]  Marc Carreras,et al.  A survey on coverage path planning for robotics , 2013, Robotics Auton. Syst..

[14]  Howie Choset,et al.  Efficient Boustrophedon Multi-Robot Coverage: an algorithmic approach , 2008, Annals of Mathematics and Artificial Intelligence.

[15]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[16]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[17]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[18]  D. Bertsekas,et al.  Adaptive aggregation methods for infinite horizon dynamic programming , 1989 .

[19]  Kenneth C. Gilbert,et al.  A New Multiperiod Multiple Traveling Salesman Problem with Heuristic and Application to a Scheduling Problem , 1992 .

[20]  Michael I. Jordan,et al.  Reinforcement Learning with Soft State Aggregation , 1994, NIPS.

[21]  E. Wacholder,et al.  A neural network algorithm for the multiple traveling salesmen problem , 1989, Biological Cybernetics.

[22]  Bo An,et al.  An Agent Reinforcement Learning Model Based on Neural Networks , 2007, LSMS.

[23]  Noa Agmon,et al.  Constructing spanning trees for efficient multi-robot coverage , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[24]  Takao Enkawa,et al.  A self‐organizing neural network approach for multiple traveling salesman and vehicle routing problems , 1999 .

[25]  T. Bektaş The multiple traveling salesman problem: an overview of formulations and solution procedures , 2006 .

[26]  Howie Choset,et al.  Coverage for robotics – A survey of recent results , 2001, Annals of Mathematics and Artificial Intelligence.

[27]  Gu Guochang,et al.  An implementation of evolutionary computation for path planning of cooperative mobile robots , 2002, Proceedings of the 4th World Congress on Intelligent Control and Automation (Cat. No.02EX527).

[28]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[29]  Bezalel Gavish,et al.  An Optimal Solution Method for Large-Scale Multiple Traveling Salesmen Problems , 1986, Oper. Res..

[30]  Stephen P. Boyd,et al.  Branch and Bound Methods , 1987 .

[31]  Patrick Brézillon,et al.  Lecture Notes in Artificial Intelligence , 1999 .

[32]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[33]  Tommi S. Jaakkola,et al.  Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.

[34]  E. Lawler,et al.  Erratum: The Traveling Salesman Problem: A Guided Tour of Combinatorial Optimization , 1986 .

[35]  Kenneth DeJong,et al.  Evolutionary Computational Approaches to Solving the Multiple Traveling Salesman Problem Using a Neighborhood Attractor Schema , 2002, EvoWorkshops.

[36]  Ted K. Ralphs,et al.  Parallel branch and cut for capacitated vehicle routing , 2003, Parallel Comput..

[37]  Bruce L. Golden,et al.  Solving vehicle routing problems using elastic nets , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[38]  Satinder P. Singh,et al.  The Efficient Learning of Multiple Task Sequences , 1991, NIPS.

[39]  M. Gunady,et al.  Reinforcement learning generalization using state aggregation with a maze-solving problem , 2012, 2012 Japan-Egypt Conference on Electronics, Communications and Computers.