Solving POMDPs using quadratically constrained linear programs

Since the early 1990's, Markov decision processes (MDPs) and their partially observable counterparts (POMDPs) have been widely used by the AI community for planning under uncertainty. POMDPs offer a rich language to describe situations involving uncertainty about the domain, stochastic actions, noisy observations, and a variety of possible objective functions. Even though an optimal solution may be concise, current exact algorithms that use dynamic programming often require an intractable amount of space. POMDP approximation algorithms can operate with a limited amount of memory, but as a consequence they provide very weak theoretical guarantees. In contrast, we describe a new approach that addresses the space requirement of POMDP algorithms while maintaining well-defined optimality guarantees.

[1]  A. Cassandra,et al.  Exact and approximate algorithms for partially observable markov decision processes , 1998 .

[2]  Edinburgh , 1875 .

[3]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[4]  Zhengzhu Feng,et al.  Symbolic heuristic search for factored Markov decision processes , 2002, AAAI/IAAI.

[5]  Craig Boutilier,et al.  VDCBPI: an Approximate Scalable Algorithm for Large POMDPs , 2004, NIPS.

[6]  Shlomo Zilberstein,et al.  Bounded Policy Iteration for Decentralized POMDPs , 2005, IJCAI.

[7]  Elizabeth D. Dolan,et al.  NEOS Server 4.0 Administrative Guide , 2001, ArXiv.

[8]  Geoffrey J. Gordon,et al.  Point-based approximations for fast POMDP solving , 2006 .

[9]  Leslie Pack Kaelbling,et al.  Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[10]  S. A. Sherman,et al.  Providence , 1906 .

[11]  Shlomo Zilberstein,et al.  Efficient Maximization in Solving POMDPs , 2005, AAAI.

[12]  Eric A. Hansen,et al.  Solving POMDPs by Searching in Policy Space , 1998, UAI.

[13]  A. Cassandra A Survey of POMDP Applications , 2003 .

[14]  Kee-Eung Kim,et al.  Solving POMDPs by Searching the Space of Finite Policies , 1999, UAI.

[15]  William Gropp,et al.  Optimization environments and the NEOS server , 1997 .

[16]  Milos Hauskrecht,et al.  Modeling treatment of ischemic heart disease with partially observable Markov decision processes , 1998, AMIA.

[17]  R. Simmons,et al.  Probabilistic Navigation in Partially Observable Environments , 1995 .

[18]  Jorge J. Moré,et al.  The NEOS Server , 1998 .

[19]  James E. Eckles,et al.  Optimum Maintenance with Incomplete Information , 1968, Oper. Res..

[20]  Reid G. Simmons,et al.  Probabilistic Robot Navigation in Partially Observable Environments , 1995, IJCAI.

[21]  S. Zilberstein,et al.  Optimal Fixed-Size Controllers for Decentralized POMDPs , 2006 .

[22]  Michael A. Saunders,et al.  SNOPT: An SQP Algorithm for Large-Scale Constrained Optimization , 2002, SIAM J. Optim..

[23]  Rina Dechter,et al.  Eighteenth national conference on Artificial intelligence , 2002 .

[24]  Michael I. Jordan,et al.  Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.

[25]  Craig Boutilier,et al.  Bounded Finite State Controllers , 2003, NIPS.

[26]  L. Goddard,et al.  Operations Research (OR) , 2007 .

[27]  Nikos A. Vlassis,et al.  Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..