An online POMDP algorithm for complex multiagent environments

In this paper, we present an online method for POMDPs, called RTBSS (Real-Time Belief Space Search), which is based on a look-ahead search to find the best action to execute at each cycle in an environment. We thus avoid the overwhelming complexity of computing a policy for each possible situation. By doing so, we show that this method is particularly efficient for large real-time environments where offline approaches are not applicable because of their complexity. We first describe the formalism of our online method, followed by some results on standard POMDPs. Then, we present an adaptation of our method for a complex multiagent environment and results showing its efficiency in such environments.

[1]  D. Aberdeen,et al.  A ( Revised ) Survey of Approximate Methods for Solving Partially Observable Markov Decision Processes , 2003 .

[2]  Michael L. Littman,et al.  Algorithms for Sequential Decision Making , 1996 .

[3]  Reid G. Simmons,et al.  Heuristic Search Value Iteration for POMDPs , 2004, UAI.

[4]  Shlomo Zilberstein,et al.  LAO*: A heuristic search algorithm that finds solutions with loops , 2001, Artif. Intell..

[5]  Hector Geffner,et al.  Solving Large POMDPs using Real Time Dynamic Programming , 1998 .

[6]  Richard Washington,et al.  BI-POMDP: Bounded, Incremental, Partially-Observable Markov-Model Planning , 1997, ECP.

[7]  Kee-Eung Kim,et al.  Solving POMDPs by Searching the Space of Finite Policies , 1999, UAI.

[8]  John N. Tsitsiklis,et al.  The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[9]  Nikos A. Vlassis,et al.  A point-based POMDP algorithm for robot planning , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[10]  Edward J. Sondik,et al.  The optimal control of par-tially observable Markov processes , 1971 .

[11]  Xavier Boyen,et al.  Tractable Inference for Complex Stochastic Processes , 1998, UAI.

[12]  Milos Hauskrecht,et al.  Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..

[13]  P. Poupart Exploiting structure to efficiently solve large scale partially observable Markov decision processes , 2005 .

[14]  Craig Boutilier,et al.  Stochastic Local Search for POMDP Controllers , 2004, AAAI.

[15]  Joelle Pineau,et al.  Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[16]  Yishay Mansour,et al.  Approximate Planning in Large POMDPs via Reusable Trajectories , 1999, NIPS.

[17]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[18]  Blai Bonet,et al.  Planning with Incomplete Information as Heuristic Search in Belief Space , 2000, AIPS.

[19]  David A. McAllester,et al.  Approximate Planning for Factored POMDPs using Belief State Simplification , 1999, UAI.

[20]  Eric A. Hansen,et al.  Solving POMDPs by Searching in Policy Space , 1998, UAI.

[21]  Hiroaki Kitano,et al.  RoboCup Rescue: a grand challenge for multi-agent systems , 2000, Proceedings Fourth International Conference on MultiAgent Systems.

[22]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..