论文信息 - LEARNING FROM DELAYED REWARDS USING INFLUENCE VALUES APPLIED TO COORDINATION IN MULTI-AGENT SYSTEMS

LEARNING FROM DELAYED REWARDS USING INFLUENCE VALUES APPLIED TO COORDINATION IN MULTI-AGENT SYSTEMS

In this work we propose a new paradigm for learning coordination in multi-agent systems. This approach is based on social interaction of people, specially in the fact that people communicate to each other what they think about their actions and this opinion has some influence in the behavior of each other. We propose a model in which multi-agents learn to coordinate their actions giving opinions about the actions of other agents and also being influenced with opinions of other agents about their actions. We use the proposed paradigm to develop a modified version of the Q-learning algorithm. The new algorithm is tested and compared with independent learning (IL) and joint action learning (JAL) in a grid problem with two agents learning to coordinate. Our approach shows to have more probability to converge to an optimal equilibrium than IL and JAL Q-learning algorithms, specially when exploration increases. Also, a nice property of our algorithm is that it does not need to make an entire model of all joint actions like JAL algorithms. Keywords— Influence Value, Reinforcement Learning, Multi-agent coordination.

D. Barrios-Aranibar | L. Gonçalves

[1] Fabrice R. Noreils,et al. Toward a Robot Architecture Integrating Cooperation between Mobile Robots: Application to Indoor Environment , 1993, Int. J. Robotics Res..

[2] Craig Boutilier,et al. The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[3] Rachid Alami,et al. Robots that Cooperatively Enhance Their Plans , 2000, DARS.

[4] Daniel Kudenko,et al. Reinforcement learning of coordination in cooperative multi-agent systems , 2002, AAAI/IAAI.

[5] W. L. Johnson,et al. Proceedings of the First International Joint Conference on Autonomous Agents and Multiagent Systems , 2002 .

[6] Akira Hayashi,et al. A multiagent reinforcement learning algorithm using extended optimal response , 2002, AAMAS '02.

[7] Kagan Tumer,et al. Learning sequences of actions in collectives of autonomous agents , 2002, AAMAS '02.

[8] Sandip Sen,et al. Towards a pareto-optimal solution in general-sum games , 2003, AAMAS '03.

[9] V. Kononen,et al. Asymmetric multiagent reinforcement learning , 2003, IEEE/WIC International Conference on Intelligent Agent Technology, 2003. IAT 2003..

[10] Craig Boutilier,et al. Coordination in multiagent reinforcement learning: a Bayesian approach , 2003, AAMAS '03.

[11] Nikos A. Vlassis,et al. Sparse cooperative Q-learning , 2004, ICML.

[12] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.

[13] Ville Könönen,et al. Asymmetric multiagent reinforcement learning , 2003, Web Intell. Agent Syst..