论文信息 - Graphical models for interactive POMDPs: representations and solutions

Graphical models for interactive POMDPs: representations and solutions

We develop new graphical representations for the problem of sequential decision making in partially observable multiagent environments, as formalized by interactive partially observable Markov decision processes (I-POMDPs). The graphical models called interactive influence diagrams (I-IDs) and their dynamic counterparts, interactive dynamic influence diagrams (I-DIDs), seek to explicitly model the structure that is often present in real-world problems by decomposing the situation into chance and decision variables, and the dependencies between the variables. I-DIDs generalize DIDs, which may be viewed as graphical representations of POMDPs, to multiagent settings in the same way that I-POMDPs generalize POMDPs. I-DIDs may be used to compute the policy of an agent given its belief as the agent acts and observes in a setting that is populated by other interacting agents. Using several examples, we show how I-IDs and I-DIDs may be applied and demonstrate their usefulness. We also show how the models may be solved using the standard algorithms that are applicable to DIDs. Solving I-DIDs exactly involves knowing the solutions of possible models of the other agents. The space of models grows exponentially with the number of time steps. We present a method of solving I-DIDs approximately by limiting the number of other agents’ candidate models at each time step to a constant. We do this by clustering models that are likely to be behaviorally equivalent and selecting a representative set from the clusters. We discuss the error bound of the approximation technique and demonstrate its empirical performance.

[1] J. MacQueen. Some methods for classification and analysis of multivariate observations , 1967 .

[2] E. J. Sondik,et al. The Optimal Control of Partially Observable Markov Decision Processes. , 1971 .

[3] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[4] S. Zamir,et al. Formulation of Bayesian analysis for games with incomplete information , 1985 .

[5] Ross D. Shachter. Evaluating Influence Diagrams , 1986, Oper. Res..

[6] Judea Pearl,et al. Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[7] Ross D. Shachter,et al. Dynamic programming and influence diagrams , 1990, IEEE Trans. Syst. Man Cybern..

[8] Eddie Dekel,et al. Hierarchies of Beliefs and Common Knowledge , 1993 .

[9] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[10] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .

[11] Craig Boutilier,et al. Computing Optimal Policies for Partially Observable Decision Processes Using Compact Representations , 1996, AAAI/IAAI, Vol. 2.

[12] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[13] D. Fudenberg,et al. The Theory of Learning in Games , 1998 .

[14] Craig Boutilier,et al. Sequential Optimality and Coordination in Multiagent Systems , 1999, IJCAI.

[15] E. Fehr,et al. Cooperation and Punishment in Public Goods Experiments , 1999, SSRN Electronic Journal.

[16] Robert J. Aumann,et al. Interactive epistemology I: Knowledge , 1999, Int. J. Game Theory.

[17] Piotr J. Gmytrasiewicz,et al. Learning models of other agents using influence diagrams , 1999 .

[18] Steffen L. Lauritzen,et al. Evaluating Influence Diagrams using LIMIDs , 2000, UAI.

[19] Ronald E. Parr,et al. Solving Factored POMDPs with Linear Value Functions , 2001 .

[20] Daphne Koller,et al. Multi-Agent Influence Diagrams for Representing and Solving Games , 2001, IJCAI.

[21] Colin Camerer. Behavioral Game Theory: Experiments in Strategic Interaction , 2003 .

[22] Ya'akov Gal,et al. A language for modeling agents' decision making processes in games , 2003, AAMAS '03.

[23] Makoto Yokoo,et al. Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings , 2003, IJCAI.

[24] Peter Norvig,et al. Artificial intelligence - a modern approach, 2nd Edition , 2003, Prentice Hall series in artificial intelligence.

[25] P. J. Gmytrasiewicz,et al. A Framework for Sequential Planning in Multi-Agent Settings , 2005, AI&M.

[26] John C. Harsanyi,et al. Games with Incomplete Information Played by "Bayesian" Players, I-III: Part I. The Basic Model& , 2004, Manag. Sci..

[27] Edmund H. Durfee,et al. Rational Coordination in Multi-Agent Environments , 2000, Autonomous Agents and Multi-Agent Systems.

[28] Prakash P. Shenoy,et al. Multistage Monte Carlo Method for Solving Influence Diagrams Using Local Computation , 2004, Manag. Sci..

[29] Prashant Doshi,et al. A Particle Filtering Based Approach to Approximating Interactive POMDPs , 2005, AAAI.

[30] Prashant Doshi,et al. Approximating state estimation in multiagent settings using particle filters , 2005, AAMAS '05.

[31] Ronald A. Howard,et al. Influence Diagrams , 2005, Decis. Anal..

[32] Prashant Doshi,et al. Exact solutions of interactive POMDPs using behavioral equivalence , 2006, AAMAS '06.

[33] Joelle Pineau,et al. Anytime Point-Based Approximations for Large POMDPs , 2006, J. Artif. Intell. Res..

[34] Shlomo Zilberstein,et al. Formal models and algorithms for decentralized decision making under uncertainty , 2008, Autonomous Agents and Multi-Agent Systems.

[35] Stacy Marsella,et al. Minimal Mental Models , 2007, AAAI.

[36] Piotr J. Gmytrasiewicz,et al. Interactive dynamic influence diagrams , 2007, AAMAS '07.