A Multi-Layer Architecture for Cooperative Multi-Agent Systems

In multi-agent cooperative systems with valued based reinforcement learning, agents learn how to complete the task by an optimal policy learned through value-policy improvement iterations. But how to design a policy that avoids social dilemmas and come to a common consensus between agents is an important issue. A method that increases the success rate of cooperation by assessing the cooperative tendency is proposed in this article. The method learns the rules of cooperation by recording cooperation probabilities for agents in a Layered Cooperation Model (LCM). Probabilities are considered as a base while agents are making a decision beneficial to all through the game theory. The method is tested using two cooperative tasks. The results show that the proposed algorithm, which addresses the instability and ambiguity in the win-or-learning-fast and policy hill-climbing method (WoLF-PHC) and requires significantly less memory space than the Nash Bargaining Solution (NBS), is more stable and more efficient than other methods.

[1]  Young-Woo Seo,et al.  Exploiting multi-agent interactions for identifying the best-payoff information source , 2005, IEEE/WIC/ACM International Conference on Intelligent Agent Technology.

[2]  Ho-fung Leung,et al.  A Distributed Mechanism for Non-transferable Utility Buyer Coalition Problem , 2007, 19th IEEE International Conference on Tools with Artificial Intelligence(ICTAI 2007).

[3]  C Cannings,et al.  Multi-player matrix games. , 1997, Bulletin of mathematical biology.

[4]  Ferenc Szidarovszky,et al.  Multi-Agent Learning Model with Bargaining , 2006, Proceedings of the 2006 Winter Simulation Conference.

[5]  Kao-Shing Hwang,et al.  An adaptive decision-making method with fuzzy Bayesian reinforcement learning for robot soccer , 2018, Inf. Sci..

[6]  Kao-Shing Hwang,et al.  Cooperative strategy based on adaptive Q-learning for robot soccer systems , 2004, IEEE Transactions on Fuzzy Systems.

[7]  Rubo Zhang,et al.  Research on adaptive heuristic critic algorithms and its applications , 2002, Proceedings of the 4th World Congress on Intelligent Control and Automation (Cat. No.02EX527).

[8]  Koji Okuhara,et al.  Consensus building based on linear solvable process in transferable utility game , 2013, 2013 International Conference on Control, Automation and Information Sciences (ICCAIS).

[9]  Kao-Shing Hwang,et al.  Cooperative multiagent congestion control for high-speed networks , 2005, IEEE Trans. Syst. Man Cybern. Part B.

[10]  Hyunggon Park,et al.  Transformation based low complexity algorithm for Nash bargaining solutions in dynamic networks , 2013, The International Conference on Information Networking 2013 (ICOIN).

[11]  Majid Nili Ahmadabadi,et al.  A Study on Expertise of Agents and Its Effects on Cooperative $Q$-Learning , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[12]  Howard M. Schwartz,et al.  Multi-Agent Machine Learning: A Reinforcement Approach , 2014 .

[13]  David Silver,et al.  A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.

[14]  Sean Luke,et al.  Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.