论文信息 - Existence of Multiagent Equilibria with Limited Agents - 字舞流文

Existence of Multiagent Equilibria with Limited Agents

Multiagent learning is a necessary yet challenging problem as multiagent systems become more prevalent and environments become more dynamic. Much of the groundbreaking work in this area draws on notable results from game theory, in particular, the concept of Nash equilibria. Learners that directly learn an equilibrium obviously rely on their existence. Learners that instead seek to play optimally with respect to the other players also depend upon equilibria since equilibria are fixed points for learning. From another perspective, agents with limitations are real and common. These may be undesired physical limitations as well as self-imposed rational limitations, such as abstraction and approximation techniques, used to make learning tractable. This article explores the interactions of these two important concepts: equilibria and limitations in learning. We introduce the question of whether equilibria continue to exist when agents have limitations. We look at the general effects limitations can have on agent behavior, and define a natural extension of equilibria that accounts for these limitations. Using this formalization, we make three major contributions: (i) a counterexample for the general existence of equilibria with limitations, (ii) sufficient conditions on limitations that preserve their existence, (iii) three general classes of games and limitations that satisfy these conditions. We then present empirical results from a specific multiagent learning algorithm applied to a specific instance of limited agents. These results demonstrate that learning with limitations is feasible, when the conditions outlined by our theoretical analysis hold.

Manuela M. Veloso | Michael H. Bowling | M. Veloso | Michael Bowling

[1] Philip Wolfe,et al. Contributions to the theory of games , 1953 .

[2] J. Nash. Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.

[3] J. Robinson. AN ITERATIVE METHOD OF SOLVING A GAME , 1951, Classics in Game Theory.

[4] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[5] H. W. Kuhn,et al. 11. Extensive Games and the Problem of Information , 1953 .

[6] A. M. Fink,et al. Equilibrium in a stochastic $n$-person game , 1964 .

[7] J. Goodman. Note on Existence and Uniqueness of Equilibrium Points for Concave N-Person Games , 1965 .

[8] Edward Gaughan,et al. Introduction to Analysis , 1969 .

[9] H. Simon,et al. From substantive to procedural rationality , 1976 .

[10] O. J. Vrieze,et al. Stochastic Games with Finite State and Action Spaces. , 1988 .

[11] L. C. Thomas,et al. Stochastic Games with Finite State and Action Spaces , 1988 .

[12] C. Watkins. Learning from delayed rewards , 1989 .

[13] Itzhak Gilboa,et al. Bounded Versus Unbounded Rationality: The Tyranny of the Weak , 1989 .

[14] Peter J. Jansen,et al. Using knowledge about the opponent in game-tree search , 1992 .

[15] Ronald J. Williams,et al. Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions , 1993 .

[16] Maja J. Mataric,et al. Reward Functions for Accelerated Learning , 1994, ICML.

[17] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[18] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.

[19] Sandip Sen,et al. Learning to Coordinate without Sharing Information , 1994, AAAI.

[20] Ariel Rubinstein,et al. A Course in Game Theory , 1995 .

[21] Shlomo Zilberstein,et al. Models of Bounded Rationality , 1995 .

[22] Csaba Szepesvári,et al. A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.

[23] J. Filar,et al. Competitive Markov Decision Processes , 1996 .

[24] David Carmel,et al. Learning Models of Intelligent Agents , 1996, AAAI/IAAI, Vol. 1.

[25] T. Cormen,et al. Model-based Learning of Interaction Strategies in Multi-agent Systems , 1997 .

[26] A. Rubinstein. Modeling Bounded Rationality , 1998 .

[27] H. Kuhn. Classics in Game Theory , 1997 .

[28] A. Rubinstein,et al. Games with Procedurally Rational Players , 1997 .

[29] Milos Hauskrecht,et al. Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.

[30] Ian Frank,et al. Soccer Server: A Tool for Research on Multiagent Systems , 1998, Appl. Artif. Intell..

[31] Craig Boutilier,et al. The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[32] Doina Precup,et al. Intra-Option Learning about Temporally Abstract Actions , 1998, ICML.

[33] Michael P. Wellman,et al. Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[34] M. Veloso,et al. Bounding the suboptimality of reusing subproblems , 1999, IJCAI 1999.

[35] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[36] Andrew Y. Ng,et al. Policy Search via Density Estimation , 1999, NIPS.

[37] Manuela M. Veloso,et al. On Behavior Classification in Adversarial Environments , 2000, DARS.

[38] Peter L. Bartlett,et al. Reinforcement Learning in POMDP's via Direct Gradient Ascent , 2000, ICML.

[39] Andrew G. Barto,et al. Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.

[40] Peter Stone,et al. Leading Best-Response Strategies in Repeated Games , 2001, International Joint Conference on Artificial Intelligence.

[41] Tuomas Sandholm,et al. Bargaining with limited computation: Deliberation equilibrium , 2001, Artif. Intell..

[42] Michael L. Littman,et al. Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[43] Manuela M. Veloso,et al. Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[44] Manuela M. Veloso,et al. Planning for Distributed Execution through Use of Probabilistic Opponent Models , 2002, AIPS.

[45] Manuela Veloso,et al. Tree based hierarchical reinforcement learning , 2002 .

[46] William T. B. Uther,et al. Adversarial Reinforcement Learning , 2003 .

[47] Keith B. Hall,et al. Correlated Q-Learning , 2003, ICML.

[48] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.

[49] Andrew W. Moore,et al. Prioritized sweeping: Reinforcement learning with less data and less time , 2004, Machine Learning.

[50] Sridhar Mahadevan,et al. Average reward reinforcement learning: Foundations, algorithms, and empirical results , 2004, Machine Learning.

[51] SRIDHAR MAHADEVAN,et al. Average Reward Reinforcement Learning: Foundations, Algorithms, and Empirical Results , 2005, Machine Learning.