Theoretical advantages of lenient Q-learners: an evolutionary game theoretic perspective

This paper presents the dynamics of multiple reinforcement learning agents from an Evolutionary Game Theoretic (EGT) perspective. We provide a Replicator Dynamics model for traditional multiagent Q-learning, and we extend these differential equations to account for lenient learners: agents that forgive possible mistakes of their teammates that resulted in lower rewards. We use this extended formal model to visualize the basins of attraction of both traditional and lenient multiagent Q-learners in two benchmark coordination problems. The results indicate that lenience provides learners with more accurate estimates for the utility of their actions, resulting in higher likelihood of convergence to the globally optimal solution. In addition, our research supports the strength of EGT as a backbone for multiagent reinforcement learning.

[1]  Kagan Tumer,et al.  Learning agents for distributed and robust spacecraft power management , 2006 .

[2]  Mitchell A. Potter,et al.  The design and analysis of a computational model of cooperative coevolution , 1997 .

[3]  Sean Luke,et al.  Time-dependent Collaboration Schemes for Cooperative Coevolutionary Algorithms , 2005, AAAI Fall Symposium: Coevolutionary and Coadaptive Systems.

[4]  J. M. Smith,et al.  The Logic of Animal Conflict , 1973, Nature.

[5]  L. Samuelson Evolutionary Games and Equilibrium Selection , 1997 .

[6]  Yishay Mansour,et al.  Nash Convergence of Gradient Dynamics in General-Sum Games , 2000, UAI.

[7]  Risto Miikkulainen,et al.  Efficient Non-linear Control Through Neuroevolution , 2006, ECML.

[8]  Tom Lenaerts,et al.  A selection-mutation model for q-learning in multi-agent systems , 2003, AAMAS '03.

[9]  Colin R. Reeves,et al.  Evolutionary computation: a unified approach , 2007, Genetic Programming and Evolvable Machines.

[10]  Kagan Tumer,et al.  Handling Communication Restrictions and Team Formation in Congestion Games , 2006, Autonomous Agents and Multi-Agent Systems.

[11]  L. Devroye Non-Uniform Random Variate Generation , 1986 .

[12]  Martin Lauer,et al.  An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems , 2000, ICML.

[13]  Kenneth A. De Jong,et al.  Cooperative Coevolution: An Architecture for Evolving Coadapted Subcomponents , 2000, Evolutionary Computation.

[14]  Kagan Tumer,et al.  Coordinating multi-rover systems: evaluation functions for dynamic and noisy environments , 2005, GECCO '05.

[15]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[16]  Rudolf Paul Wiegand,et al.  An analysis of cooperative coevolutionary algorithms , 2004 .

[17]  Dan Ventura,et al.  Predicting and Preventing Coordination Problems in Cooperative Q-learning Systems , 2007, IJCAI.

[18]  M. Vose The Simple Genetic Algorithm , 1999 .

[19]  Sean Luke,et al.  Lenience towards Teammates Helps in Cooperative Multiagent Learning , 2005 .

[20]  R. Paul Wiegand,et al.  Improving Coevolutionary Search for Optimal Multiagent Behaviors , 2003, IJCAI.

[21]  Jörgen W. Weibull,et al.  Evolutionary Game Theory , 1996 .

[22]  Daniel Kudenko,et al.  Reinforcement learning of coordination in cooperative multi-agent systems , 2002, AAAI/IAAI.

[23]  Thomas Jansen,et al.  The Cooperative Coevolutionary (11) EA , 2004, Evolutionary Computation.

[24]  Sean Luke,et al.  Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.

[25]  Kenneth de Jong,et al.  Evolutionary computation: a unified approach , 2007, GECCO.

[26]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[27]  J M Smith,et al.  Evolution and the theory of games , 1976 .

[28]  Rense Corten,et al.  Game Theory Evolving: A Problem-Centered Introduction to Modeling Strategic Interaction (Second Edition) by Herbert Gintis , 2009, J. Artif. Soc. Soc. Simul..

[29]  R. Eriksson,et al.  Cooperative Coevolution in Inventory Control Optimisation , 1997, ICANNGA.

[30]  Kagan Tumer,et al.  Evolving distributed agents for managing air traffic , 2007, GECCO '07.

[31]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[32]  M. Bacharach Economics and the Theory of Games , 2019 .

[33]  Josef Hofbauer,et al.  Evolutionary Games and Population Dynamics , 1998 .

[34]  Kenneth A. De Jong,et al.  A Cooperative Coevolutionary Approach to Function Optimization , 1994, PPSN.

[35]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[36]  R. Paul Wiegand,et al.  An empirical analysis of collaboration methods in cooperative coevolutionary algorithms , 2001 .

[37]  R. Paul Wiegand,et al.  A Visual Demonstration of Convergence Properties of Cooperative Coevolution , 2004, PPSN.

[38]  Sean Luke,et al.  Selecting informative actions improves cooperative multiagent learning , 2006, AAMAS '06.

[39]  Kenneth A. De Jong,et al.  Modeling Variation in Cooperative Coevolution Using Evolutionary Game Theory , 2002, FOGA.

[40]  R. Paul Wiegand,et al.  Robustness in cooperative coevolution , 2006, GECCO '06.

[41]  Karl Tuyls,et al.  An Evolutionary Dynamical Analysis of Multi-Agent Learning in Iterated Games , 2005, Autonomous Agents and Multi-Agent Systems.

[42]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.