A hybrid P2P and master-slave architecture for intelligent multi-agent reinforcement learning in a distributed computing environment: A case study

In this paper, we propose a distributed architecture for reinforcement learning in a multi-agent environment, where agents share information learned via a distributed network. Here we propose a hybrid master/slave and peer-to-peer system architecture, where a master node effectively assigns a work load (a portion of the terrain) to each node. However, this master node also manages communications between all the other system nodes, and in that sense it is a peer-to-peer architecture. It is a loosely-coupled system in that node slaves only know about the existence of the master node, and are only concerned with their work load (portion of the terrain). As part of this architecture, we show how agents are allowed to communicate with other agents in the same or different nodes and share information that pertains to all agents, including the agent obstacle barriers. In particular, one main contribution of the paper is multi-agent reenforcement learning in a distributed system, where the agents do not have complete knowledge and information of their environment, other than what is available on the computing node, the particular agent (s) is (are) running on. We show how agents, running on same or different nodes, coordinate the sharing of their respective environment states/information to collaboratively perform their respective tasks.

[1]  Richard W. Prager,et al.  Reinforcement learning methods for multi-linked manipulator obstacle avoidance and control , 1993, Proceedings 1993 Asia-Pacific Workshop on Advances in Motion Control.

[2]  Chin-Teng Lin,et al.  GA-based fuzzy reinforcement learning for control of a magnetic bearing system , 2000, IEEE Trans. Syst. Man Cybern. Part B.

[3]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[4]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[5]  Tomohiro Yamaguchi,et al.  Realtime reinforcement learning for a real robot in the real environment , 1996, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. IROS '96.

[6]  John M. Holland Designing Autonomous Mobile Robots , 2003 .

[7]  Hani Al-Dayaa,et al.  Reinforcement learning technique using agent state occurrence frequency with analysis of knowledge sharing on the agent’s learning process in multiagent environments , 2010, The Journal of Supercomputing.

[8]  Peter Norvig,et al.  Artificial intelligence - a modern approach, 2nd Edition , 2003, Prentice Hall series in artificial intelligence.

[9]  Hani Al-Dayaa,et al.  Fast Reinforcement Learning Techniques Using the Euclidean Distance and Agent State Occurrence Frequency , 2006, MLMTA.

[10]  John R. Anderson,et al.  MACHINE LEARNING An Artificial Intelligence Approach , 2009 .

[11]  Tom Duckett,et al.  Vision-based reinforcement learning using approximate policy iteration , 2009, 2009 International Conference on Advanced Robotics.

[12]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[13]  Alex M. Andrew,et al.  Artificial Intelligence and Mobile Robots , 1999 .

[14]  J. Maxwell A Treatise on Electricity and Magnetism , 1873, Nature.

[15]  Qiang Gao,et al.  Reinforcement Learning for Engine Idle Speed Control , 2010, 2010 International Conference on Measuring Technology and Mechatronics Automation.