论文信息 - An approximate algorithm for solving oracular POMDPs

An approximate algorithm for solving oracular POMDPs

We propose a new approximate algorithm, LA- JIV (lookahead J-MDP information value), to solve oracular partially observable Markov decision problems (OPOMDPs), a special type of POMDP that rather than standard observations includes an "oracle" that can be consulted for full state information at a fixed cost. We previously introduced JIV (J-MDP information value) to solve OPOMDPs, an heuristic algorithm that utilizes the solution of the underlying MDP and weighs the value of consulting the oracle against the value of taking a state-modifying action. While efficient, JIV will rarely find the optimal solution. In this paper, we extend JIV to include lookahead, thereby permitting arbitrarily small deviation from the optimal policy's long-term expected reward at the cost of added computation time. The depth of the lookahead is a parameter that governs this tradeoff; by iteratively increasing this depth, we provide an anytime algorithm that yields an ever- improving solution. LA-JIV leverages the OPOMDP framework's unique characteristics to outperform general-purpose approximate POMDP solvers; in fact, we prove that LA-JIV is a poly-time approximation scheme (PTAS) with respect to the size of the state and observation spaces, thereby showing rigorously that OPOMDPs are "easier" than POMDPs. Finally, we substantiate our theoretical results via an empirical analysis of a benchmark OPOMDP instance.

Manuela M. Veloso | Nicholas Armstrong-Crews | M. Veloso | N. Armstrong-Crews

[1] D. Aberdeen,et al. A ( Revised ) Survey of Approximate Methods for Solving Partially Observable Markov Decision Processes , 2003 .

[2] Reid G. Simmons,et al. Heuristic Search Value Iteration for POMDPs , 2004, UAI.

[3] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[4] Thomas G. Dietterich,et al. A POMDP Approximation Algorithm That Anticipates the Need to Observe , 2000, PRICAI.

[5] Richard M. Anderson,et al. Complexity results for infinite-horizon markov decision processes , 2000 .

[6] Leslie Pack Kaelbling,et al. On the Complexity of Solving Markov Decision Problems , 1995, UAI.

[7] Wolfram Burgard,et al. Coastal Navigation { Robot Motion with Uncertainty , 1998, AAAI 1998.

[8] John N. Tsitsiklis,et al. The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[9] Hector Geffner,et al. Solving Large POMDPs using Real Time Dynamic Programming , 1998 .

[10] Joelle Pineau,et al. Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[11] Manuela M. Veloso,et al. Oracular Partially Observable Markov Decision Processes: A Very Special Case , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[12] E. A. Hansen. Markov Decision Processes with Observation Costs TITLE2 , 2001 .

[13] Joelle Pineau,et al. Active Learning in Partially Observable Markov Decision Processes , 2005, ECML.

[14] Leslie Pack Kaelbling,et al. Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.