Monte Carlo Sampling Methods for Approximating Interactive POMDPs

Partially observable Markov decision processes (POMDPs) provide a principled framework for sequential planning in uncertain single agent settings. An extension of POMDPs to multiagent settings, called interactive POMDPs (I-POMDPs), replaces POMDP belief spaces with interactive hierarchical belief systems which represent an agent's belief about the physical world, about beliefs of other agents, and about their beliefs about others' beliefs. This modification makes the difficulties of obtaining solutions due to complexity of the belief and policy spaces even more acute. We describe a general method for obtaining approximate solutions of I-POMDPs based on particle filtering (PF). We introduce the interactive PF, which descends the levels of the interactive belief hierarchies and samples and propagates beliefs at each level. The interactive PF is able to mitigate the belief space complexity, but it does not address the policy space complexity. To mitigate the policy space complexity - sometimes also called the curse of history - we utilize a complementary method based on sampling likely observations while building the look ahead reachability tree. While this approach does not completely address the curse of history, it beats back the curse's impact substantially. We provide experimental results and chart future work.

[1]  Yishay Mansour,et al.  A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.

[2]  Shlomo Zilberstein,et al.  Improved Memory-Bounded Dynamic Programming for Decentralized POMDPs , 2007, UAI.

[3]  Timothy J. Robinson,et al.  Sequential Monte Carlo Methods in Practice , 2003 .

[4]  J. Geweke,et al.  Bayesian Inference in Econometric Models Using Monte Carlo Integration , 1989 .

[5]  Nando de Freitas,et al.  Rao-Blackwellised Particle Filtering for Dynamic Bayesian Networks , 2000, UAI.

[6]  Robert J. Aumann,et al.  Interactive epistemology I: Knowledge , 1999, Int. J. Game Theory.

[7]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[8]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[9]  S. Zamir,et al.  Formulation of Bayesian analysis for games with incomplete information , 1985 .

[10]  Fred Daum,et al.  Mysterious computational complexity of particle filters , 2002, SPIE Defense + Commercial Sensing.

[11]  Anne Lohrli Chapman and Hall , 1985 .

[12]  Prashant Doshi,et al.  Approximating state estimation in multiagent settings using particle filters , 2005, AAMAS '05.

[13]  Noga Alon,et al.  The Probabilistic Method , 2015, Fundamentals of Ramsey Theory.

[14]  Edward M. Carapezza,et al.  Sensors, C3I, Information, and Training Technologies for Law Enforcement , 1999 .

[15]  Sanjoy Dasgupta,et al.  A Generalization of Principal Components Analysis to the Exponential Family , 2001, NIPS.

[16]  Geoffrey J. Gordon,et al.  Finding Approximate POMDP solutions Through Belief Compression , 2011, J. Artif. Intell. Res..

[17]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[18]  Annette J. Dobson,et al.  An introduction to generalized linear models , 1991 .

[19]  A. Heifetz,et al.  Topology-Free Typology of Beliefs , 1998 .

[20]  Craig Boutilier,et al.  VDCBPI: an Approximate Scalable Algorithm for Large POMDPs , 2004, NIPS.

[21]  Arnaud Doucet,et al.  A survey of convergence results on particle filtering methods for practitioners , 2002, IEEE Trans. Signal Process..

[22]  Shlomo Zilberstein,et al.  Memory-Bounded Dynamic Programming for DEC-POMDPs , 2007, IJCAI.

[23]  Brahim Chaib-draa,et al.  An online POMDP algorithm for complex multiagent environments , 2005, AAMAS '05.

[24]  Craig Boutilier,et al.  Value-Directed Compression of POMDPs , 2002, NIPS.

[25]  Joelle Pineau,et al.  Online Planning Algorithms for POMDPs , 2008, J. Artif. Intell. Res..

[26]  Sai-Ming Li,et al.  Forest fire monitoring with multiple small UAVs , 2005, Proceedings of the 2005, American Control Conference, 2005..

[27]  Douglas W. Murphy,et al.  Applications for mini VTOL UAV for law enforcement , 1999, Other Conferences.

[28]  Harold W. Sorenson,et al.  Recursive Bayesian estimation using piece-wise constant approximations , 1988, Autom..

[29]  Leslie Pack Kaelbling,et al.  Sampling Methods for Action Selection in Influence Diagrams , 2000, AAAI/IAAI.

[30]  Jerry Nedelman,et al.  Book review: “Bayesian Data Analysis,” Second Edition by A. Gelman, J.B. Carlin, H.S. Stern, and D.B. Rubin Chapman & Hall/CRC, 2004 , 2005, Comput. Stat..

[31]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[32]  Joelle Pineau,et al.  Anytime Point-Based Approximations for Large POMDPs , 2006, J. Artif. Intell. Res..

[33]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[34]  Marciano M. Siniscalchi,et al.  Hierarchies of Conditional Beliefs and Interactive Epistemology in Dynamic Games , 1999 .

[35]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[36]  Prashant Doshi,et al.  A Particle Filtering Based Approach to Approximating Interactive POMDPs , 2005, AAAI.

[37]  Prashant Doshi,et al.  Generalized Point Based Value Iteration for Interactive POMDPs , 2008, AAAI.

[38]  Eddie Dekel,et al.  Hierarchies of Beliefs and Common Knowledge , 1993 .

[39]  Ronald Fagin,et al.  Reasoning about knowledge , 1995 .

[40]  Aravind Srinivasan,et al.  Chernoff-Hoeffding bounds for applications with limited independence , 1995, SODA '93.

[41]  S. Zilberstein,et al.  Formal Models and Algorithms for Decentralized Control of Multiple Agents Technical Report UM-CS-2005-068 , 2005 .

[42]  N. Gordon,et al.  Novel approach to nonlinear/non-Gaussian Bayesian state estimation , 1993 .

[43]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[44]  H. W. Sorenson,et al.  Kalman filtering : theory and application , 1985 .

[45]  P. J. Gmytrasiewicz,et al.  A Framework for Sequential Planning in Multi-Agent Settings , 2005, AI&M.

[46]  Craig Boutilier,et al.  Value-Directed Sampling Methods for POMDPs , 2001, UAI.

[47]  Sebastian Thrun,et al.  Monte Carlo POMDPs , 1999, NIPS.

[48]  Mc Borja,et al.  An Introduction to Generalized Linear Models, 3rd edition , 2009 .

[49]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[50]  John C. Harsanyi,et al.  Games with Incomplete Information Played by "Bayesian" Players, I-III: Part I. The Basic Model& , 2004, Manag. Sci..

[51]  H. Sorenson,et al.  Recursive bayesian estimation using gaussian sums , 1971 .

[52]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[53]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[54]  Shlomo Zilberstein,et al.  Formal models and algorithms for decentralized decision making under uncertainty , 2008, Autonomous Agents and Multi-Agent Systems.

[55]  John N. Tsitsiklis,et al.  Feature-based methods for large scale dynamic programming , 2004, Machine Learning.

[56]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[57]  E. J. Sondik,et al.  The Optimal Control of Partially Observable Markov Decision Processes. , 1971 .

[58]  Daphne Koller,et al.  Sampling in Factored Dynamic Systems , 2001, Sequential Monte Carlo Methods in Practice.

[59]  Craig Boutilier,et al.  Value-directed sampling methods for monitoring POMDPs , 2001, UAI 2001.

[60]  J SondikEdward The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon , 1978 .

[61]  Wolfram Burgard,et al.  A Probabilistic Approach to Collaborative Multi-Robot Localization , 2000, Auton. Robots.

[62]  J. Harsanyi Games with Incomplete Information Played by 'Bayesian' Players, Part III. The Basic Probability Distribution of the Game , 1968 .

[63]  Prashant Doshi Improved State Estimation in Multiagent Settings with Continuous or Large Discrete State Spaces , 2007, AAAI.

[64]  Yifeng Zeng,et al.  Graphical models for online solutions to interactive POMDPs , 2007, AAMAS '07.

[65]  Tao Wang,et al.  Bayesian sparse sampling for on-line reward optimization , 2005, ICML.

[66]  Drew McDermott,et al.  Planning and Acting , 1978, Cogn. Sci..