Evolutionary Dynamics of Multi-Agent Learning: A Survey

The interaction of multiple autonomous agents gives rise to highly dynamic and nondeterministic environments, contributing to the complexity in applications such as automated financial markets, smart grids, or robotics. Due to the sheer number of situations that may arise, it is not possible to foresee and program the optimal behaviour for all agents beforehand. Consequently, it becomes essential for the success of the system that the agents can learn their optimal behaviour and adapt to new situations or circumstances. The past two decades have seen the emergence of reinforcement learning, both in single and multi-agent settings, as a strong, robust and adaptive learning paradigm. Progress has been substantial, and a wide range of algorithms are now available. An important challenge in the domain of multi-agent learning is to gain qualitative insights into the resulting system dynamics. In the past decade, tools and methods from evolutionary game theory have been successfully employed to study multi-agent learning dynamics formally in strategic interactions. This article surveys the dynamical models that have been derived for various multi-agent reinforcement learning algorithms, making it possible to study and compare them qualitatively. Furthermore, new learning algorithms that have been introduced using these evolutionary game theoretic tools are reviewed. The evolutionary models can be used to study complex strategic interactions. Examples of such analysis are given for the domains of automated trading in stock markets and collision avoidance in multi-robot systems. The paper provides a roadmap on the progress that has been achieved in analysing the evolutionary dynamics of multi-agent learning by highlighting the main results and accomplishments.

[1]  J. Neumann,et al.  Theory of games and economic behavior , 1945, 100 Years of Math Milestones.

[2]  O. H. Brownlee,et al.  ACTIVITY ANALYSIS OF PRODUCTION AND ALLOCATION , 1952 .

[3]  J. M. Smith,et al.  The Logic of Animal Conflict , 1973, Nature.

[4]  J. Cross A Stochastic Learning Model of Economic Behavior , 1973 .

[5]  Kumpati S. Narendra,et al.  Learning Automata - A Survey , 1974, IEEE Trans. Syst. Man Cybern..

[6]  W. Hamilton,et al.  The Evolution of Cooperation , 1984 .

[7]  Richard Wheeler,et al.  Decentralized learning in finite Markov chains , 1985, 1985 24th IEEE Conference on Decision and Control.

[8]  Robert Gibbons,et al.  A primer in game theory , 1992 .

[9]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[10]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[11]  Jörgen W. Weibull,et al.  Evolutionary Game Theory , 1996 .

[12]  P. S. Sastry,et al.  Continuous action set learning automata for stochastic optimization , 1994 .

[13]  W. Ebeling Stochastic Processes in Physics and Chemistry , 1995 .

[14]  Moshe Tennenholtz,et al.  Adaptive Load Balancing: A Study in Multi-Agent Learning , 1994, J. Artif. Intell. Res..

[15]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[16]  Tilman Börgers,et al.  Learning Through Reinforcement and Replicator Dynamics , 1997 .

[17]  Timothy Gordon,et al.  Continuous action reinforcement learning applied to vehicle suspension control , 1997 .

[18]  Josef Hofbauer,et al.  Evolutionary Games and Population Dynamics , 1998 .

[19]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[20]  Paolo Fiorini,et al.  Motion Planning in Dynamic Environments Using Velocity Obstacles , 1998, Int. J. Robotics Res..

[21]  J. Oechssler,et al.  Evolutionary dynamics on infinite strategy spaces , 2001 .

[22]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[23]  Stuart J. Russell,et al.  Bayesian Q-Learning , 1998, AAAI/IAAI.

[24]  Malcolm J. A. Strens,et al.  A Bayesian Framework for Reinforcement Learning , 2000, ICML.

[25]  Yishay Mansour,et al.  Nash Convergence of Gradient Dynamics in General-Sum Games , 2000, UAI.

[26]  Manuela M. Veloso,et al.  Rational and Convergent Learning in Stochastic Games , 2001, IJCAI.

[27]  Manuela M. Veloso,et al.  Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[28]  Daniel Kudenko,et al.  Reinforcement learning of coordination in cooperative multi-agent systems , 2002, AAAI/IAAI.

[29]  G. Tesauro,et al.  Analyzing Complex Strategic Interactions in Multi-Agent Systems , 2002 .

[30]  P. S. Sastry,et al.  Varieties of learning automata: an overview , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[31]  Tom Lenaerts,et al.  A selection-mutation model for q-learning in multi-agent systems , 2003, AAMAS '03.

[32]  Bernard Manderick,et al.  Extended Replicator Dynamics as a Key to Reinforcement Learning in Multi-agent Systems , 2003, ECML.

[33]  Gerald Tesauro,et al.  Extending Q-Learning to General Adaptive Multi-Agent Systems , 2003, NIPS.

[34]  Craig Boutilier,et al.  Coordination in multiagent reinforcement learning: a Bayesian approach , 2003, AAMAS '03.

[35]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[36]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[37]  Jeffrey K. Bassett,et al.  An Analysis of Cooperative Coevolutionary Algorithms A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy at George Mason University , 2003 .

[38]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[39]  Peter McBurney,et al.  An evolutionary game-theoretic comparison of two double-auction market designs , 2004, AAMAS'04.

[40]  Michael H. Bowling,et al.  Convergence and No-Regret in Multiagent Learning , 2004, NIPS.

[41]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[42]  Sean Luke,et al.  Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.

[43]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[44]  Karl Tuyls,et al.  An Evolutionary Dynamical Analysis of Multi-Agent Learning in Iterated Games , 2005, Autonomous Agents and Multi-Agent Systems.

[45]  Ann Nowé,et al.  Coordinated exploration in multi-agent reinforcement learning: an application to load-balancing , 2005, AAMAS '05.

[46]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[47]  Ann Nowé,et al.  Evolutionary game theory and multi-agent reinforcement learning , 2005, The Knowledge Engineering Review.

[48]  M. Ruijgrok,et al.  Replicator dynamics with mutations for games with a continuous strategy space , 2005, nlin/0505032.

[49]  Karl Tuyls,et al.  An Overview of Cooperative and Competitive Multiagent Learning , 2005, LAMAS.

[50]  Ross Cressman,et al.  Stability of the replicator equation with continuous strategy space , 2004, Math. Soc. Sci..

[51]  Y. Mansour,et al.  4 Learning , Regret minimization , and Equilibria , 2006 .

[52]  L. Buşoniu Evolutionary function approximation for reinforcement learning , 2006 .

[53]  Vincent Conitzer,et al.  AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents , 2003, Machine Learning.

[54]  Peter Stone,et al.  A multi-robot system for continuous area sweeping tasks , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[55]  Yoav Shoham,et al.  If multi-agent learning is the answer, what is the question? , 2007, Artif. Intell..

[56]  Y. Mansour,et al.  Algorithmic Game Theory: Learning, Regret Minimization, and Equilibria , 2007 .

[57]  Yishay Mansour,et al.  From External to Internal Regret , 2005, J. Mach. Learn. Res..

[58]  Dan Ventura,et al.  Predicting and Preventing Coordination Problems in Cooperative Q-learning Systems , 2007, IJCAI.

[59]  Simon Parsons,et al.  What evolutionary game theory tells us about multiagent learning , 2007, Artif. Intell..

[60]  Robert van Rooij,et al.  The Stag Hunt and the Evolution of Social Structure , 2007, Stud Logica.

[61]  Victor R. Lesser,et al.  A Multiagent Reinforcement Learning Algorithm with Non-linear Dynamics , 2008, J. Artif. Intell. Res..

[62]  Matthias Rauterberg,et al.  Formalizing Multi-state Learning Dynamics , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[63]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[64]  Peter Vrancx,et al.  Decentralized Learning in Markov Games , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[65]  Peter Vrancx,et al.  Switching dynamics of multi-agent learning , 2008, AAMAS.

[66]  Michael Kaisers,et al.  Learning against learning : evolutionary dynamics of reinforcement learning algorithms in strategic interactions , 2008 .

[67]  Karl Tuyls,et al.  Theoretical Advantages of Lenient Learners: An Evolutionary Game Theoretic Perspective , 2008, J. Mach. Learn. Res..

[68]  Karl Tuyls,et al.  Replicator Dynamics in Discrete and Continuous Strategy Spaces , 2009, Multi-Agent Systems.

[69]  Simon Parsons,et al.  An evolutionary model of multi-agent learning with a varying exploration rate , 2009, AAMAS.

[70]  Matthias Rauterberg,et al.  State-coupled replicator dynamics , 2009, AAMAS.

[71]  M. Pipattanasomporn,et al.  Multi-agent systems in a distributed smart grid: Design and implementation , 2009, 2009 IEEE/PES Power Systems Conference and Exposition.

[72]  Ryszard Kowalczyk,et al.  Dynamic analysis of multiagent Q-learning with ε-greedy exploration , 2009, ICML '09.

[73]  Jan Ramon,et al.  An evolutionary game-theoretic analysis of poker strategies , 2009, Entertain. Comput..

[74]  Michael L. Littman,et al.  Classes of Multiagent Q-learning Dynamics with epsilon-greedy Exploration , 2010, ICML.

[75]  Karl Tuyls,et al.  Evolutionary Dynamics of Regret Minimization , 2010, ECML/PKDD.

[76]  Kagan Tumer,et al.  A multiagent approach to managing air traffic flow , 2010, Autonomous Agents and Multi-Agent Systems.

[77]  Karl Tuyls,et al.  Frequency adjusted multi-agent Q-learning , 2010, AAMAS.

[78]  C. Cannings,et al.  Evolutionary Game Theory , 2010 .

[79]  Bart De Schutter,et al.  Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[80]  Aram Galstyan,et al.  Continuous strategy replicator dynamics for multi-agent Q-learning , 2009, Autonomous Agents and Multi-Agent Systems.

[81]  Karl Tuyls,et al.  Empirical and theoretical support for lenient learning , 2011, AAMAS.

[82]  Karl Tuyls,et al.  FAQ-Learning in Matrix Games: Demonstrating Convergence Near Nash Equilibria, and Bifurcation of Attractors in the Battle of Sexes , 2011, Interactive Decision Theory and Game Theory.

[83]  Gerhard Weiss,et al.  Multiagent Learning: Basics, Challenges, and Prospects , 2012, AI Mag..

[84]  Simon Parsons,et al.  Evolutionary advantage of foresight in markets , 2012, GECCO '12.

[85]  Aram Galstyan,et al.  Dynamics of Boltzmann Q learning in two-player two-action games. , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[86]  Karl Tuyls,et al.  A common gradient in multi-agent reinforcement learning , 2012, AAMAS.

[87]  Abdel Rodríguez,et al.  An RL approach to common-interest continuous action games , 2012, AAMAS.

[88]  Karl Tuyls,et al.  Collision avoidance under bounded localization uncertainty , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[89]  Sherief Abdallah,et al.  Addressing the policy-bias of q-learning by repeating updates , 2013, AAMAS.

[90]  Ann Nowé,et al.  A decentralized approach for convention emergence in multi-agent systems , 2013, Autonomous Agents and Multi-Agent Systems.

[91]  Marcello Restelli,et al.  Efficient Evolutionary Dynamics with Extensive-Form Games , 2013, AAAI.

[92]  Marcello Restelli,et al.  Evolutionary Dynamics of Q-Learning over the Sequence Form , 2014, AAAI.

[93]  Marc Lanctot,et al.  Further developments of extensive-form replicator dynamics using the sequence-form representation , 2014, AAMAS.

[94]  Peter McBurney,et al.  Trading in markets with noisy information: an evolutionary analysis , 2015, Connect. Sci..