Cooperative Q-learning based on maturity of the policy

In order to improve the convergence speed of reinforcement learning and avoid the local optimum for multi-robot systems, a new method of cooperative Q-learning based on maturity of the policy is presented. The learning process is executed at the blackboard architecture making use of all the robots in the training scenario to explore the learning space and collect experiences. The reinforcement learning algorithm was divided into two types: constant credit-degree and variable credit-degree, which the particle swarm optimize algorithm (PSO) is adopted to find the optimum for the constant credit-factor. The method is used to the task for fire-disaster response. Simulation experiments verify the effectiveness of the proposed algorithm.

[1]  L. Buşoniu,et al.  A comprehensive survey of multi-agent reinforcement learning , 2011 .

[2]  Majid Nili Ahmadabadi,et al.  Expertness based cooperative Q-learning , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[3]  Yang Zhilian Overview of particle swarm optimization , 2003 .

[4]  Kevin Warwick,et al.  MUTUAL LEARNING BY AUTONOMOUS MOBILE ROBOTS , 1997 .

[5]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[6]  Maja J. Mataric,et al.  Broadcast of local eligibility: behavior-based control for strongly cooperative robot teams , 2000, International Conference on Autonomous Agents.

[7]  Lynne E. Parker,et al.  ALLIANCE: an architecture for fault tolerant multirobot cooperation , 1998, IEEE Trans. Robotics Autom..

[8]  Yantao Tian,et al.  Cooperative Q Learning Based on Blackboard Architecture , 2007, 2007 International Conference on Computational Intelligence and Security Workshops (CISW 2007).

[9]  Y. Kuroe,et al.  Swarm reinforcement learning algorithms -exchange of information among multiple agents- , 2007, SICE Annual Conference 2007.

[10]  Maja J. Mataric,et al.  Murdoch: publish/subscribe task allocation for heterogeneous agents , 2000, AGENTS '00.

[11]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[12]  Maja J. Mataric,et al.  Broadcast of Local Elibility for Multi-Target Observation , 2000, DARS.

[13]  Tucker R. Balch,et al.  Communication, Diversity and Learning: Cornerstones of Swarm Behavior , 2004, Swarm Robotics.

[14]  Maja J. Matarić,et al.  Sold!: Market methods for multi-robot control , 2001 .