论文信息 - Correlated Q-Learning - 字舞流文

Correlated Q-Learning

This paper introduces Correlated-Q (CE-Q) learning, a multiagent Q-learning algorithm based on the correlated equilibrium (CE) solution concept. CE-Q generalizes both Nash-Q and Friend-and-Foe-Q: in general-sum games, the set of correlated equilibria contains the set of Nash equilibria; in constant-sum games, the set of correlated equilibria contains the set of minimax equilibria. This paper describes experiments with four variants of CE-Q, demonstrating empirical convergence to equilibrium policies on a testbed of general-sum Markov games.

Keith B. Hall | Amy Greenwald | Keith Hall | A. Greenwald

[1] A. M. Fink,et al. Equilibrium in a stochastic $n$-person game , 1964 .

[2] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[3] J. Neumann,et al. Theory of games and economic behavior , 1945, 100 Years of Math Milestones.

[4] Jeffrey O. Kephart,et al. Pricing in Agent Economies Using Multi-Agent Q-Learning , 2002, Autonomous Agents and Multi-Agent Systems.

[5] Michael P. Wellman,et al. Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[6] Chris Watkins,et al. Learning from delayed rewards , 1989 .

[7] L. Shapley. A Value for n-person Games , 1988 .

[8] H. Peyton Young,et al. Stochastic Evolutionary Game Dynamics , 1990 .

[9] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[10] Craig Boutilier,et al. The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[11] J. Filar,et al. Competitive Markov Decision Processes , 1996 .

[12] Gunes Ercal,et al. On No-Regret Learning, Fictitious Play, and Nash Equilibrium , 2001, ICML.

[13] E. Rowland. Theory of Games and Economic Behavior , 1946, Nature.

[14] Christos H. Papadimitriou,et al. Algorithms, games, and the internet , 2001, STOC '01.

[15] Vincent Conitzer,et al. Complexity Results about Nash Equilibria , 2002, IJCAI.

[16] Csaba Szepesvári,et al. A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.

[17] C. E. Lemke,et al. Equilibrium Points of Bimatrix Games , 1964 .

[18] R. Aumann. Subjectivity and Correlation in Randomized Strategies , 1974 .

[19] F. Forges. Correlated Equilibrium in Two-Person Zero-Sum Games , 1990 .

[20] Michael L. Littman,et al. Cyclic Equilibria in Markov Games , 2005, NIPS.

[21] Michael L. Littman,et al. Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[22] R. Bellman. Dynamic programming. , 1957, Science.

[23] Michael P. Wellman,et al. Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[24] Murray Milgate,et al. General equilibrium : the New Palgrave , 1989 .

[25] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[26] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[27] Eitan Zemel,et al. Nash and correlated equilibria: Some complexity considerations , 1989 .

[28] Ariel Rubinstein,et al. A Course in Game Theory , 1995 .

[29] J. Robinson. AN ITERATIVE METHOD OF SOLVING A GAME , 1951, Classics in Game Theory.

[30] Yoav Freund,et al. Game theory, on-line prediction and boosting , 1996, COLT '96.

[31] Dean P. Foster,et al. Regret in the On-Line Decision Problem , 1999 .

[32] Michael H. Bowling,et al. Convergence Problems of General-Sum Multiagent Reinforcement Learning , 2000, ICML.

[33] J. Nash. NON-COOPERATIVE GAMES , 1951, Classics in Game Theory.

[34] Eric van Damme,et al. Non-Cooperative Games , 2000 .

[35] Manuela M. Veloso,et al. Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[36] Manuela M. Veloso,et al. Rational and Convergent Learning in Stochastic Games , 2001, IJCAI.