Learning efficient Nash equilibria in distributed systems

An individualʼs learning rule is completely uncoupled if it does not depend directly on the actions or payoffs of anyone else. We propose a variant of log linear learning that is completely uncoupled and that selects an efficient (welfare-maximizing) pure Nash equilibrium in all generic n-person games that possess at least one pure Nash equilibrium. In games that do not have such an equilibrium, there is a simple formula that expresses the long-run probability of the various disequilibrium states in terms of two factors: (i) the sum of payoffs over all agents, and (ii) the maximum payoff gain that results from a unilateral deviation by some agent. This welfare/stability trade-off criterion provides a novel framework for analyzing the selection of disequilibrium as well as equilibrium states in n-person games.

[1]  Jason R. Marden,et al.  Revisiting log-linear learning: Asynchrony, completeness and payoff-based implementation , 2010 .

[2]  Jason R. Marden,et al.  Payoff-Based Dynamics for Multiplayer Weakly Acyclic Games , 2009, SIAM J. Control. Optim..

[3]  J. Hofbauer,et al.  Uncoupled Dynamics Do Not Lead to Nash Equilibrium , 2003 .

[4]  Jason R. Marden,et al.  Achieving Pareto Optimality Through Distributed Learning , 2011 .

[5]  Shie Mannor,et al.  Multi-agent learning for engineers , 2007, Artif. Intell..

[6]  Tim Roughgarden,et al.  Selfish routing and the price of anarchy , 2005 .

[7]  Yakov Babichenko,et al.  Completely uncoupled dynamics and Nash equilibria , 2012, Games Econ. Behav..

[8]  L. Blume The Statistical Mechanics of Strategic Interaction , 1993 .

[9]  H. Young,et al.  Individual Strategy and Social Structure: An Evolutionary Theory of Institutions , 1999 .

[10]  Yakov Babichenko,et al.  How long to Pareto efficiency? , 2014, Int. J. Game Theory.

[11]  William H. Sandholm,et al.  Evolutionary Implementation and Congestion Pricing , 2002 .

[12]  R. Rob,et al.  Learning, Mutation, and Long Run Equilibria in Games , 1993 .

[13]  H. Peyton Young,et al.  Learning by trial and error , 2009, Games Econ. Behav..

[14]  Devavrat Shah,et al.  Dynamics in congestion games , 2010, SIGMETRICS '10.

[15]  Jason R. Marden,et al.  Cooperative Control and Potential Games , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[16]  Jason R. Marden,et al.  Revisiting log-linear learning: Asynchrony, completeness and payoff-based implementation , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[17]  Yakov Babichenko,et al.  Average Testing and the Efficient Boundary , 2011 .

[18]  Lawrence E. Blume,et al.  How noise matters , 2003, Games Econ. Behav..

[19]  Andreu Mas-Colell,et al.  Stochastic Uncoupled Dynamics and Nash Equilibrium , 2004, Games Econ. Behav..

[20]  H. Young,et al.  The Evolution of Conventions , 1993 .

[21]  H. Peyton Young,et al.  Learning, hypothesis testing, and Nash equilibrium , 2003, Games Econ. Behav..

[22]  Yishay Mansour,et al.  How long to equilibrium? The communication complexity of uncoupled equilibrium procedures , 2010, Games Econ. Behav..

[23]  L. Blume The Statistical Mechanics of Best-Response Strategy Revision , 1995 .

[24]  B. Peleg,et al.  Automata, matching and foraging behavior of bees , 1995 .

[25]  Debraj Ray,et al.  Evolving Aspirations and Cooperation , 1998 .

[26]  Dean Phillips Foster,et al.  Regret Testing: Learning to Play Nash Equilibrium Without Knowing You Have an Opponent , 2006 .

[27]  Uzi Motro,et al.  NEAR-FAR SEARCH : AN EVOLUTIONARILY STABLE FORAGING STRATEGY , 1995 .

[28]  H. Peyton Young,et al.  Individual Strategy and Social Structure , 2020 .

[29]  Amin Saberi,et al.  On the Inefficiency Ratio of Stable Equilibria in Congestion Games , 2009, WINE.

[30]  Christos H. Papadimitriou,et al.  Algorithms, Games, and the Internet , 2001, ICALP.

[31]  Manuela M. Veloso,et al.  Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[32]  G. Lugosi,et al.  Global Nash Convergence of Foster and Young's Regret Testing , 2004 .