Cooperative Q-learning based on learning automata

The theory of learning automata has already been applied in reinforcement learning which is characterized by single-agent and single-stage. This paper proposed a multi-robot cooperative Q-learning algorithm based on learning automata. Each robot updates probability for action selection through the learning automata constantly, and then converts the probability to special experience. Robots can accelerate the learning process by means of sharing experiences among each other. Simulation experiments verify the effectiveness of this algorithm.

[1]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[2]  Jia-Cai Fu,et al.  Algorithm of task-allocation based on realizing at the lowest cost in multimobile robot system , 2004, Proceedings of 2004 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.04EX826).

[3]  M. L. Tsetlin,et al.  Automaton theory and modeling of biological systems , 1973 .

[4]  John William McManus,et al.  Design and analysis techniques for concurrent blackboard systems , 1996, IEEE Trans. Syst. Man Cybern. Part A.

[5]  Yantao Tian,et al.  A Hybrid Ant Colony Optimization Algorithm for Path Planning of Robot in Dynamic Environment , 2006 .

[6]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[7]  Maarten Peeters,et al.  Learning Automata as a Basis for Multi Agent Reinforcement Learning , 2005, EUMAS.

[8]  Maja J. Mataric,et al.  Interference as a Tool for Designing and Evaluating Multi-Robot Controllers , 1997, AAAI/IAAI.

[9]  Yantao Tian,et al.  Algorithms of task-allocation and cooperation in multi mobile robot system , 2004, Fifth World Congress on Intelligent Control and Automation (IEEE Cat. No.04EX788).

[10]  M. N. Ahmadabadi,et al.  An extension of weighted strategy sharing in cooperative Q-learning for specialized agents , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..

[11]  Tucker R. Balch,et al.  Communication, Diversity and Learning: Cornerstones of Swarm Behavior , 2004, Swarm Robotics.

[12]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[13]  Victor Lesser,et al.  The Evolution of Blackboard Control Architectures , 1992 .

[14]  Majid Nili Ahmadabadi,et al.  Expertness based cooperative Q-learning , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[15]  Yantao Tian,et al.  Cooperative Q Learning Based on Blackboard Architecture , 2007, 2007 International Conference on Computational Intelligence and Security Workshops (CISW 2007).

[16]  Y. Kuroe,et al.  Swarm reinforcement learning algorithms -exchange of information among multiple agents- , 2007, SICE Annual Conference 2007.