A New Multi-Agent Reinforcement Learning Method Based on Evolving Dynamic Correlation Matrix

Multi-agent reinforcement learning approaches can be roughly classified into two categories. One is the agent-based approach which can be implemented in real distributed systems, though most approaches of this type cannot provide meaningful theoretical verifications. The other can be seen as the more formalized approach, which can provide theoretical results. However, most of current algorithms usually require unrealistic global communication, which makes them impractical for real distributed systems. In this article, we propose a dynamic correlation matrix based multi-agent reinforcement learning approach where the meta-parameters are evolved using an evolutionary algorithm. We believe that our approach is able to fill the gap between the two kinds of traditional multi-agent reinforcement learning approaches by providing both agent-level implementation and system-level convergence verification. The basic idea of this approach is that agents learn not only from local environmental feedback, i.e., their own experiences and rewards, but also from other agents’ experiences. In this way, the agents’ learning speed can be increased significantly. The performance of the proposed algorithm is demonstrated on a number of application scenarios, including blackjack games, urban traffic control systems and multi-robot foraging.

[1]  Liujing Wang,et al.  Joint Optimization of Multi-UAV Target Assignment and Path Planning Based on Multi-Agent Reinforcement Learning , 2019, IEEE Access.

[2]  Eric Bonabeau,et al.  Agent-based modeling: Methods and techniques for simulating human systems , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Maurice Bruynooghe,et al.  Multi-agent Relational Reinforcement Learning , 2005, LAMAS.

[4]  Kenji Doya,et al.  Evolution of meta-parameters in reinforcement learning algorithm , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[5]  Yu Chen,et al.  An improved multiagent reinforcement learning algorithm , 2005, IEEE/WIC/ACM International Conference on Intelligent Agent Technology.

[6]  Akira Hayashi,et al.  A multiagent reinforcement learning algorithm using extended optimal response , 2002, AAMAS '02.

[7]  Anthony Stentz,et al.  Market-based Multirobot Coordination for Complex Tasks , 2006, Int. J. Robotics Res..

[8]  Thomas J. Walsh Transferring State Abstractions Between MDPs , 2006 .

[9]  Zhanshan Wang,et al.  Data-Based Optimal Control of Multiagent Systems: A Reinforcement Learning Design Approach , 2017, IEEE Transactions on Cybernetics.

[10]  Michael L. Littman,et al.  Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[11]  Wei Wang,et al.  Sampled-Data-Based Consensus and $L_{2}$ -Gain Analysis for Heterogeneous Multiagent Systems , 2017, IEEE Transactions on Cybernetics.

[12]  Jan Ramon,et al.  Transfer learning for reinforcement learning through goal and policy parametrization , 2006, ICML 2006.

[13]  Radhika Nagpal,et al.  Collective construction of environmentally-adaptive structures , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[14]  James McLurkin,et al.  Distributed Algorithms for Dispersion in Indoor Environments Using a Swarm of Autonomous Mobile Robots , 2004, DARS.

[15]  Ana L. C. Bazzan,et al.  A reinforcement learning-based multi-agent framework applied for solving routing and scheduling problems , 2019, Expert Syst. Appl..

[16]  Mingjie Lin,et al.  Constructive Policy: Reinforcement Learning Approach for Connected Multi-Agent Systems , 2019, 2019 IEEE 15th International Conference on Automation Science and Engineering (CASE).

[17]  Yaser Al-Onaizan,et al.  On being a teammate: experiences acquired in the design of RoboCup teams , 1999, AGENTS '99.

[18]  Jeffrey L. Krichmar,et al.  Evolutionary robotics: The biology, intelligence, and technology of self-organizing machines , 2001, Complex..

[19]  Gaurav S. Sukhatme,et al.  Collective construction with multiple robots , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[20]  Yan Meng,et al.  Dynamic correlation matrix based multi-Q learning for a multi-robot system , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[21]  Melanie Coggan Exploration and Exploitation in Reinforcement Learning , 2004 .

[22]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[23]  Zhong-Ping Jiang,et al.  Cooperative and Adaptive Optimal Output Regulation of Discrete-Time Multi-Agent Systems Using Reinforcement Learning , 2018, 2018 IEEE International Conference on Real-time Computing and Robotics (RCAR).

[24]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[25]  Gian Carlo Cardarilli,et al.  Q‐RTS: a real‐time swarm intelligence based on multi‐agent Q‐learning , 2019, Electronics Letters.

[26]  Gaurav S. Sukhatme,et al.  Collective transport of robots: Coherent, minimalist multi-robot leader-following , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[27]  Tucker Balch,et al.  Reward and Diversity in Multirobot Foraging , 1999, IJCAI 1999.

[28]  Jianhong Zhou,et al.  Smart Multi-RAT Access Based on Multiagent Reinforcement Learning , 2018, IEEE Transactions on Vehicular Technology.

[29]  Marco Dorigo,et al.  Towards group transport by swarms of robots , 2009, Int. J. Bio Inspired Comput..

[30]  Radhika Nagpal,et al.  Distributed construction by mobile robots with enhanced building blocks , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[31]  L. Buşoniu Evolutionary function approximation for reinforcement learning , 2006 .

[32]  Geoffrey Ye Li,et al.  Spectrum Sharing in Vehicular Networks Based on Multi-Agent Reinforcement Learning , 2019, IEEE Journal on Selected Areas in Communications.

[33]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[34]  Wei Wang,et al.  $H_\infty $ Relay Tracking Control of Multiagent Systems With the Assistance of a Voronoi Diagram , 2017, IEEE Transactions on Circuits and Systems II: Express Briefs.

[35]  Nikos A. Vlassis,et al.  Collaborative Multiagent Reinforcement Learning by Payoff Propagation , 2006, J. Mach. Learn. Res..

[36]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[37]  Keith B. Hall,et al.  Correlated Q-Learning , 2003, ICML.

[38]  Maurice Bruynooghe,et al.  Learning a transfer function for reinforcement learning problems , 2008, AAAI 2008.

[39]  Yan Meng,et al.  LIVS: Local Interaction via Virtual Stigmergy coordination in distributed search and collective cleanup , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[40]  Eric Bonabeau,et al.  Interactive estimation of agent-based financial markets models: modularity and learning , 2005, GECCO '05.

[41]  Junwei Gao,et al.  FMRQ—A Multiagent Reinforcement Learning Algorithm for Fully Cooperative Tasks , 2017, IEEE Transactions on Cybernetics.

[42]  Isaac Meilijson,et al.  Evolution of Reinforcement Learning in Uncertain Environments: Emergence of Risk-Aversion and Matching , 2001, ECAL.

[43]  Jude W. Shavlik,et al.  Relational Macros for Transfer in Reinforcement Learning , 2007, ILP.

[44]  Francesco Mondada,et al.  Understanding collective aggregation mechanisms: From probabilistic modelling to experiments with real robots , 1999, Robotics Auton. Syst..

[45]  Radhika Nagpal,et al.  Extended stigmergy in collective construction , 2006, IEEE Intelligent Systems.

[46]  Radhika Nagpal,et al.  Collective Construction Using Lego Robots , 2006, AAAI.

[47]  Christian Igel,et al.  Uncertainty handling CMA-ES for reinforcement learning , 2009, GECCO.

[48]  Tucker R. Balch,et al.  Communication in reactive multiagent robotic systems , 1995, Auton. Robots.

[49]  Kagan Tumer,et al.  QUICR-Learning for Multi-Agent Coordination , 2006, AAAI.

[50]  Guillaume J. Laurent,et al.  Hysteretic q-learning :an algorithm for decentralized reinforcement learning in cooperative multi-agent teams , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[51]  Daniel Kudenko,et al.  Reinforcement learning of coordination in cooperative multi-agent systems , 2002, AAAI/IAAI.

[52]  Maya Rupert,et al.  An Organisational Multi-agent Systems Approach for Designing Collaborative Tagging Systems , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[53]  Yan Meng,et al.  Distributed Reinforcement Learning for Coordinate Multi-Robot Foraging , 2010, J. Intell. Robotic Syst..

[54]  I. Roman-Ballesteros,et al.  A Framework for Cooperative Multi-Robot Surveillance Tasks , 2006, Electronics, Robotics and Automotive Mechanics Conference (CERMA'06).

[55]  Milind Tambe,et al.  Towards Flexible Teamwork , 1997, J. Artif. Intell. Res..

[56]  Gaochao Xu,et al.  A Novel Task Provisioning Approach Fusing Reinforcement Learning for Big Data , 2019, IEEE Access.

[57]  Marco Dorigo,et al.  Swarm intelligence: from natural to artificial systems , 1999 .

[58]  Tucker R. Balch,et al.  Behavior-based formation control for multirobot teams , 1998, IEEE Trans. Robotics Autom..