Just add Pepper: extending learning algorithms for repeated matrix games to repeated Markov games

Learning in multi-agent settings has recently garnered much interest, the result of which has been the development of somewhat effective multi-agent learning (MAL) algorithms for repeated normal-form games. However, general-purpose MAL algorithms for richer environments, such as general-sum repeated stochastic (Markov) games (RSGs), are less advanced. Indeed, previously created MAL algorithms for RSGs are typically successful only when the behavior of associates meets specific game theoretic assumptions and when the game is of a particular class (such as zero-sum games). In this paper, we present a new algorithm, called Pepper, that can be used to extend MAL algorithms designed for repeated normal-form games to RSGs. We demonstrate that Pepper creates a family of new algorithms, each of whose asymptotic performance in RSGs is reminiscent of its asymptotic performance in related repeated normal-form games. We also show that some algorithms formed with Pepper outperform existing algorithms in an interesting RSG.

[1]  Bruno Bouzy,et al.  Multi-agent Learning Experiments on Repeated Matrix Games , 2010, ICML.

[2]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[3]  Michael A. Goodrich,et al.  Learning to compete, compromise, and cooperate in repeated general-sum games , 2005, ICML.

[4]  Murray L Weidenbaum,et al.  Learning to compete , 1986 .

[5]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[6]  M. Goodrich,et al.  Neglect Tolerant Teaming: Issues and Dilemmas , 2003 .

[7]  Michael L. Littman,et al.  A Polynomial-time Nash Equilibrium Algorithm for Repeated Stochastic Games , 2008, UAI.

[8]  Michael L. Littman,et al.  Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[9]  Michael A. Goodrich,et al.  Learning to compete, coordinate, and cooperate in repeated games using reinforcement learning , 2011, Machine Learning.

[10]  Debraj Ray,et al.  Evolving Aspirations and Cooperation , 1998 .

[11]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[12]  Michael A. Goodrich,et al.  Learning To Cooperate in a Social Dilemma: A Satisficing Approach to Bargaining , 2003, ICML.

[13]  Michael H. Bowling,et al.  Convergence Problems of General-Sum Multiagent Reinforcement Learning , 2000, ICML.

[14]  Michael H. Bowling,et al.  Efficient Nash equilibrium approximation through Monte Carlo counterfactual regret minimization , 2012, AAMAS.

[15]  Manuela M. Veloso,et al.  Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[16]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[17]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[18]  Nicolò Cesa-Bianchi,et al.  Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[19]  Michael H. Bowling,et al.  Convergence and No-Regret in Multiagent Learning , 2004, NIPS.

[20]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[21]  P. Taylor,et al.  Evolutionarily Stable Strategies and Game Dynamics , 1978 .

[22]  Ferenc Szidarovszky,et al.  Multi-Agent Learning Model with Bargaining , 2006, Proceedings of the 2006 Winter Simulation Conference.

[23]  Peter Vrancx,et al.  Future sparse interactions : a MARL approach , 2011 .

[24]  Harold Houba Game Theory Evolving: a Problem-centered Introduction to Modeling Stratgeic Behavior [Review of: H. Gintis (2000) Game Theory Evolving: a Problem-centered Introduction to Modeling Strategic Behavior] , 2001 .

[25]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[26]  Keith B. Hall,et al.  Correlated Q-Learning , 2003, ICML.

[27]  Herbert Gintis,et al.  Game Theory Evolving: A Problem-Centered Introduction to Modeling Strategic Interaction - Second Edition , 2009 .