A Framework for Sequential Planning in Multi-Agent Settings

This paper extends the framework of partially observable Markov decision processes (POMDPs) to multi-agent settings by incorporating the notion of agent models into the state space. Agents maintain beliefs over physical states of the environment and over models of other agents, and they use Bayesian updates to maintain their beliefs over time. The solutions map belief states to actions. Models of other agents may include their belief states and are related to agent types considered in games of incomplete information. We express the agents' autonomy by postulating that their models are not directly manipulable or observable by other agents. We show that important properties of POMDPs, such as convergence of value iteration, the rate of convergence, and piece-wise linearity and convexity of the value functions carry over to our framework. Our approach complements a more traditional approach to interactive settings which uses Nash equilibria as a solution paradigm. We seek to avoid some of the drawbacks of equilibria which may be non-unique and do not capture off-equilibrium behaviors. We do so at the cost of having to represent, process and continuously revise models of other agents. Since the agent's beliefs may be arbitrarily nested, the optimal solutions to decision making problems are only asymptotically computable. However, approximate belief updates and approximately optimal plans are computable. We illustrate our framework using a simple application domain, and we show examples of belief updates and value functions.

[1]  E. J. Sondik,et al.  The Optimal Control of Partially Observable Markov Decision Processes. , 1971 .

[2]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[3]  Madan G. Singh,et al.  Decentralized Control , 1981 .

[4]  Patrick D. Larkey,et al.  Subjective Probability and the Theory of Games , 1982 .

[5]  Kenneth G. Binmore,et al.  Essays on Foundations of Game Theory , 1982 .

[6]  S. Zamir,et al.  Formulation of Bayesian analysis for games with incomplete information , 1985 .

[7]  John N. Tsitsiklis,et al.  The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[8]  Nancy L. Stokey,et al.  Recursive methods in economic dynamics , 1989 .

[9]  John Nachbar “Evolutionary” selection dynamics in games: Convergence and limit properties , 1990 .

[10]  Roger B. Myerson,et al.  Game theory - Analysis of Conflict , 1991 .

[11]  W. Lovejoy A survey of algorithmic methods for partially observed Markov decision processes , 1991 .

[12]  S. Hart,et al.  Handbook of Game Theory with Economic Applications , 1992 .

[13]  E. Kalai,et al.  Subjective Equilibrium in Repeated Games , 1993 .

[14]  Eddie Dekel,et al.  Hierarchies of Beliefs and Common Knowledge , 1993 .

[15]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[16]  E. Kalai,et al.  Rational Learning Leads to Nash Equilibrium , 1993 .

[17]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[18]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[19]  Ronald Fagin,et al.  Reasoning about knowledge , 1995 .

[20]  G. W. Wornell,et al.  Decentralized control of a multiple access broadcast channel: performance bounds , 1996, Proceedings of 35th IEEE Conference on Decision and Control.

[21]  John Nachbar,et al.  Non-computable strategies and discounted repeated games , 1996 .

[22]  Leon A. Petrosyan,et al.  Game Theory (Second Edition) , 1996 .

[23]  Milos Hauskrecht,et al.  Planning and control in stochastic domains with imperfect information , 1997 .

[24]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[25]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[26]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[27]  Craig Boutilier,et al.  Sequential Optimality and Coordination in Multiagent Systems , 1999, IJCAI.

[28]  Robert J. Aumann,et al.  Interactive epistemology I: Knowledge , 1999, Int. J. Game Theory.

[29]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[30]  Ronald Fagin,et al.  The hierarchical approach to modeling knowledge and common knowledge , 1999, Int. J. Game Theory.

[31]  Marciano M. Siniscalchi,et al.  Hierarchies of Conditional Beliefs and Interactive Epistemology in Dynamic Games , 1999 .

[32]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[33]  Milos Hauskrecht,et al.  Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..

[34]  Sanguk Noh,et al.  Identifying the Scope of Modeling for Time-Critical Multiagent Decision- Making , 2001, IJCAI.

[35]  R. Simon Locally Finite Knowledge Structures , 2001, 1110.3068.

[36]  Daphne Koller,et al.  Multi-Agent Influence Diagrams for Representing and Solving Games , 2001, IJCAI.

[37]  Robert J. Aumann,et al.  Chapter 43 Incomplete information , 2002 .

[38]  David V. Pynadath,et al.  Towards Computing Optimal Policies for Decentralized POMDPs , 2002 .

[39]  David V. Pynadath,et al.  Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings , 2003, IJCAI.

[40]  Anne Condon,et al.  On the undecidability of probabilistic planning and related stochastic optimization problems , 2003, Artif. Intell..

[41]  John C. Harsanyi,et al.  Games with Incomplete Information Played by "Bayesian" Players, I-III: Part I. The Basic Model& , 2004, Manag. Sci..

[42]  Edmund H. Durfee,et al.  Rational Coordination in Multi-Agent Environments , 2000, Autonomous Agents and Multi-Agent Systems.