Multiagent learning in large anonymous games

In large systems, it is important for agents to learn to act effectively, but sophisticated multi-agent learning algorithms generally do not scale. An alternative approach is to find restricted classes of games where simple, efficient algorithms converge. It is shown that stage learning efficiently converges to Nash equilibria in large anonymous games if best-reply dynamics converge. Two features are identified that improve convergence. First, rather than making learning more difficult, more agents are actually beneficial in many settings. Second, providing agents with statistical information about the behavior of others can significantly reduce the number of observations needed.

[1]  Paul R. Milgrom,et al.  Rationalizability, Learning, and Equilibrium in Games with Strategic Complementarities , 1990 .

[2]  Richard T. Boylan Laws of large numbers for dynamical systems with randomly matched individuals , 1992 .

[3]  Ariel Rubinstein,et al.  A Course in Game Theory , 1995 .

[4]  E. Hopkins Learning, Matching and Aggregation , 1995 .

[5]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[6]  S. Hart,et al.  A simple adaptive procedure leading to correlated equilibrium , 2000 .

[7]  Eric J. Friedman,et al.  Learning and Implementation on the Internet , 1997 .

[8]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[9]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[10]  Jeffrey O. Kephart,et al.  Dynamic pricing by software agents , 2000, Comput. Networks.

[11]  Michael H. Bowling,et al.  Convergence Problems of General-Sum Multiagent Reinforcement Learning , 2000, ICML.

[12]  Manuela M. Veloso,et al.  Rational and Convergent Learning in Stochastic Games , 2001, IJCAI.

[13]  Gunes Ercal,et al.  On No-Regret Learning, Fictitious Play, and Nash Equilibrium , 2001, ICML.

[14]  Scott Shenker,et al.  Learning in Network Contexts: Experimental Results from Simulations , 2001, Games Econ. Behav..

[15]  M. Blonski Equilibrium Characterization in Large Anonymous Games , 2001 .

[16]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[17]  Yoav Shoham,et al.  Multi-Agent Reinforcement Learning:a critical survey , 2003 .

[18]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[19]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[20]  Jeffrey O. Kephart,et al.  Pricing in Agent Economies Using Multi-Agent Q-Learning , 2002, Autonomous Agents and Multi-Agent Systems.

[21]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[22]  Dean Phillips Foster,et al.  Regret Testing: Learning to Play Nash Equilibrium Without Knowing You Have an Opponent , 2006 .

[23]  Ann Nowé,et al.  Exploring selfish reinforcement learning in repeated games with stochastic rewards , 2007, Autonomous Agents and Multi-Agent Systems.

[24]  Joseph Y. Halpern,et al.  Efficiency and nash equilibria in a scrip system for P2P networks , 2006, EC '06.

[25]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[26]  Avrim Blum,et al.  Routing without regret: on convergence to nash equilibria of regret-minimizing algorithms in routing games , 2006, PODC '06.

[27]  Y. Mansour,et al.  Algorithmic Game Theory: Learning, Regret Minimization, and Equilibria , 2007 .

[28]  Joseph Y. Halpern,et al.  Optimizing scrip systems: efficiency, crashes, hoarders, and altruists , 2007, EC '07.

[29]  Jason R. Marden,et al.  Payoff based dynamics for multi-player weakly acyclic games , 2007, 2007 46th IEEE Conference on Decision and Control.

[30]  C. Papadimitriou,et al.  Computing Equilibria in Anonymous Games , 2007, FOCS 2007.

[31]  Jason R. Marden,et al.  Regret based dynamics: convergence in weakly acyclic games , 2007, AAMAS '07.

[32]  Fabrizio Germano,et al.  Global Nash Convergence of Foster and Young's Regret Testing , 2004, Games Econ. Behav..

[33]  Connections between cooperative control and potential games illustrated on the consensus problem , 2007, 2007 European Control Conference (ECC).

[34]  Omer Reingold,et al.  Fault tolerance in large games , 2008, EC '08.

[35]  Noam Nisan,et al.  Asynchronous Best-Reply Dynamics , 2008, WINE.

[36]  Jason R. Marden,et al.  Payoff-Based Dynamics for Multiplayer Weakly Acyclic Games , 2009, SIAM J. Control. Optim..

[37]  H. Peyton Young,et al.  Learning by trial and error , 2009, Games Econ. Behav..

[38]  R. Johari,et al.  Mean Field Equilibrium in Dynamic Games with Complementarities , 2010 .

[39]  Andrea J. Goldsmith,et al.  On oblivious equilibrium in large population stochastic games , 2010, 49th IEEE Conference on Decision and Control (CDC).

[40]  Noam Nisan,et al.  Best-Response Mechanisms , 2011, ICS.