Graphical models for interactive POMDPs: representations and solutions

We develop new graphical representations for the problem of sequential decision making in partially observable multiagent environments, as formalized by interactive partially observable Markov decision processes (I-POMDPs). The graphical models called interactive influence diagrams (I-IDs) and their dynamic counterparts, interactive dynamic influence diagrams (I-DIDs), seek to explicitly model the structure that is often present in real-world problems by decomposing the situation into chance and decision variables, and the dependencies between the variables. I-DIDs generalize DIDs, which may be viewed as graphical representations of POMDPs, to multiagent settings in the same way that I-POMDPs generalize POMDPs. I-DIDs may be used to compute the policy of an agent given its belief as the agent acts and observes in a setting that is populated by other interacting agents. Using several examples, we show how I-IDs and I-DIDs may be applied and demonstrate their usefulness. We also show how the models may be solved using the standard algorithms that are applicable to DIDs. Solving I-DIDs exactly involves knowing the solutions of possible models of the other agents. The space of models grows exponentially with the number of time steps. We present a method of solving I-DIDs approximately by limiting the number of other agents’ candidate models at each time step to a constant. We do this by clustering models that are likely to be behaviorally equivalent and selecting a representative set from the clusters. We discuss the error bound of the approximation technique and demonstrate its empirical performance.

[1]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[2]  E. J. Sondik,et al.  The Optimal Control of Partially Observable Markov Decision Processes. , 1971 .

[3]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[4]  S. Zamir,et al.  Formulation of Bayesian analysis for games with incomplete information , 1985 .

[5]  Ross D. Shachter Evaluating Influence Diagrams , 1986, Oper. Res..

[6]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[7]  Ross D. Shachter,et al.  Dynamic programming and influence diagrams , 1990, IEEE Trans. Syst. Man Cybern..

[8]  Eddie Dekel,et al.  Hierarchies of Beliefs and Common Knowledge , 1993 .

[9]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[10]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[11]  Craig Boutilier,et al.  Computing Optimal Policies for Partially Observable Decision Processes Using Compact Representations , 1996, AAAI/IAAI, Vol. 2.

[12]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[13]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[14]  Craig Boutilier,et al.  Sequential Optimality and Coordination in Multiagent Systems , 1999, IJCAI.

[15]  E. Fehr,et al.  Cooperation and Punishment in Public Goods Experiments , 1999, SSRN Electronic Journal.

[16]  Robert J. Aumann,et al.  Interactive epistemology I: Knowledge , 1999, Int. J. Game Theory.

[17]  Piotr J. Gmytrasiewicz,et al.  Learning models of other agents using influence diagrams , 1999 .

[18]  Steffen L. Lauritzen,et al.  Evaluating Influence Diagrams using LIMIDs , 2000, UAI.

[19]  Ronald E. Parr,et al.  Solving Factored POMDPs with Linear Value Functions , 2001 .

[20]  Daphne Koller,et al.  Multi-Agent Influence Diagrams for Representing and Solving Games , 2001, IJCAI.

[21]  Colin Camerer Behavioral Game Theory: Experiments in Strategic Interaction , 2003 .

[22]  Ya'akov Gal,et al.  A language for modeling agents' decision making processes in games , 2003, AAMAS '03.

[23]  Makoto Yokoo,et al.  Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings , 2003, IJCAI.

[24]  Peter Norvig,et al.  Artificial intelligence - a modern approach, 2nd Edition , 2003, Prentice Hall series in artificial intelligence.

[25]  P. J. Gmytrasiewicz,et al.  A Framework for Sequential Planning in Multi-Agent Settings , 2005, AI&M.

[26]  John C. Harsanyi,et al.  Games with Incomplete Information Played by "Bayesian" Players, I-III: Part I. The Basic Model& , 2004, Manag. Sci..

[27]  Edmund H. Durfee,et al.  Rational Coordination in Multi-Agent Environments , 2000, Autonomous Agents and Multi-Agent Systems.

[28]  Prakash P. Shenoy,et al.  Multistage Monte Carlo Method for Solving Influence Diagrams Using Local Computation , 2004, Manag. Sci..

[29]  Prashant Doshi,et al.  A Particle Filtering Based Approach to Approximating Interactive POMDPs , 2005, AAAI.

[30]  Prashant Doshi,et al.  Approximating state estimation in multiagent settings using particle filters , 2005, AAMAS '05.

[31]  Ronald A. Howard,et al.  Influence Diagrams , 2005, Decis. Anal..

[32]  Prashant Doshi,et al.  Exact solutions of interactive POMDPs using behavioral equivalence , 2006, AAMAS '06.

[33]  Joelle Pineau,et al.  Anytime Point-Based Approximations for Large POMDPs , 2006, J. Artif. Intell. Res..

[34]  Shlomo Zilberstein,et al.  Formal models and algorithms for decentralized decision making under uncertainty , 2008, Autonomous Agents and Multi-Agent Systems.

[35]  Stacy Marsella,et al.  Minimal Mental Models , 2007, AAAI.

[36]  Piotr J. Gmytrasiewicz,et al.  Interactive dynamic influence diagrams , 2007, AAMAS '07.