Correlated Q-Learning

This paper introduces Correlated-Q (CE-Q) learning, a multiagent Q-learning algorithm based on the correlated equilibrium (CE) solution concept. CE-Q generalizes both Nash-Q and Friend-and-Foe-Q: in general-sum games, the set of correlated equilibria contains the set of Nash equilibria; in constant-sum games, the set of correlated equilibria contains the set of minimax equilibria. This paper describes experiments with four variants of CE-Q, demonstrating empirical convergence to equilibrium policies on a testbed of general-sum Markov games.

[1]  A. M. Fink,et al.  Equilibrium in a stochastic $n$-person game , 1964 .

[2]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[3]  J. Neumann,et al.  Theory of games and economic behavior , 1945, 100 Years of Math Milestones.

[4]  Jeffrey O. Kephart,et al.  Pricing in Agent Economies Using Multi-Agent Q-Learning , 2002, Autonomous Agents and Multi-Agent Systems.

[5]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[6]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[7]  L. Shapley A Value for n-person Games , 1988 .

[8]  H. Peyton Young,et al.  Stochastic Evolutionary Game Dynamics , 1990 .

[9]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[10]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[11]  J. Filar,et al.  Competitive Markov Decision Processes , 1996 .

[12]  Gunes Ercal,et al.  On No-Regret Learning, Fictitious Play, and Nash Equilibrium , 2001, ICML.

[13]  E. Rowland Theory of Games and Economic Behavior , 1946, Nature.

[14]  Christos H. Papadimitriou,et al.  Algorithms, games, and the internet , 2001, STOC '01.

[15]  Vincent Conitzer,et al.  Complexity Results about Nash Equilibria , 2002, IJCAI.

[16]  Csaba Szepesvári,et al.  A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.

[17]  C. E. Lemke,et al.  Equilibrium Points of Bimatrix Games , 1964 .

[18]  R. Aumann Subjectivity and Correlation in Randomized Strategies , 1974 .

[19]  F. Forges Correlated Equilibrium in Two-Person Zero-Sum Games , 1990 .

[20]  Michael L. Littman,et al.  Cyclic Equilibria in Markov Games , 2005, NIPS.

[21]  Michael L. Littman,et al.  Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[22]  R. Bellman Dynamic programming. , 1957, Science.

[23]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[24]  Murray Milgate,et al.  General equilibrium : the New Palgrave , 1989 .

[25]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[26]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[27]  Eitan Zemel,et al.  Nash and correlated equilibria: Some complexity considerations , 1989 .

[28]  Ariel Rubinstein,et al.  A Course in Game Theory , 1995 .

[29]  J. Robinson AN ITERATIVE METHOD OF SOLVING A GAME , 1951, Classics in Game Theory.

[30]  Yoav Freund,et al.  Game theory, on-line prediction and boosting , 1996, COLT '96.

[31]  Dean P. Foster,et al.  Regret in the On-Line Decision Problem , 1999 .

[32]  Michael H. Bowling,et al.  Convergence Problems of General-Sum Multiagent Reinforcement Learning , 2000, ICML.

[33]  J. Nash NON-COOPERATIVE GAMES , 1951, Classics in Game Theory.

[34]  Eric van Damme,et al.  Non-Cooperative Games , 2000 .

[35]  Manuela M. Veloso,et al.  Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[36]  Manuela M. Veloso,et al.  Rational and Convergent Learning in Stochastic Games , 2001, IJCAI.