Anytime Point-Based Approximations for Large POMDPs

The Partially Observable Markov Decision Process has long been recognized as a rich framework for real-world planning and control problems, especially in robotics. However exact solutions in this framework are typically computationally intractable for all but the smallest problems. A well-known technique for speeding up POMDP solving involves performing value backups at specific belief points, rather than over the entire belief simplex. The efficiency of this approach, however, depends greatly on the selection of points. This paper presents a set of novel techniques for selecting informative belief points which work well in practice. The point selection procedure is combined with point-based value backups to form an effective anytime POMDP algorithm called Point-Based Value Iteration (PBVI). The first aim of this paper is to introduce this algorithm and present a theoretical analysis justifying the choice of belief selection technique. The second aim of this paper is to provide a thorough empirical comparison between PBVI and other state-of-the-art POMDP methods, in particular the Perseus algorithm, in an effort to highlight their similarities and differences. Evaluation is performed using both standard POMDP domains and realistic robotic tasks.

[1]  R. E. Kalman,et al.  A New Approach to Linear Filtering and Prediction Problems , 2002 .

[2]  Karl Johan Åström,et al.  Optimal control of Markov processes with incomplete state information , 1965 .

[3]  A. Jazwinski Stochastic Processes and Filtering Theory , 1970 .

[4]  Edward J. Sondik,et al.  The optimal control of par-tially observable Markov processes , 1971 .

[5]  Richard Fikes,et al.  STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[6]  L. Devroye Non-Uniform Random Variate Generation , 1986 .

[7]  David Chapman,et al.  Planning for Conjunctive Goals , 1987, Artif. Intell..

[8]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[9]  David A. McAllester,et al.  Systematic Nonlinear Planning , 1991, AAAI.

[10]  Chelsea C. White,et al.  A survey of solution techniques for the partially observed Markov decision process , 1991, Ann. Oper. Res..

[11]  William S. Lovejoy,et al.  Computationally Feasible Bounds for Partially Observed Markov Decision Processes , 1991, Oper. Res..

[12]  W. Lovejoy A survey of algorithmic methods for partially observed Markov decision processes , 1991 .

[13]  Daniel S. Weld,et al.  UCPOP: A Sound, Complete, Partial Order Planner for ADL , 1992, KR.

[14]  Avrim Blum,et al.  Fast Planning Through Planning Graph Analysis , 1995, IJCAI.

[15]  Leslie Pack Kaelbling,et al.  Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[16]  Wenju Liu,et al.  Planning in Stochastic Domains: Problem Characteristics and Approximation , 1996 .

[17]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[18]  Michael L. Littman,et al.  Algorithms for Sequential Decision Making , 1996 .

[19]  Michael L. Littman,et al.  Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.

[20]  Milos Hauskrecht,et al.  Incremental Methods for Computing Bounds in Partially Observable Markov Decision Processes , 1997, AAAI/IAAI.

[21]  Ronen I. Brafman,et al.  A Heuristic Variable Grid Solution Method for POMDPs , 1997, AAAI/IAAI.

[22]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[23]  Kenneth M. Dawson-Howe,et al.  The application of robotics to a mobility aid for the elderly blind , 1998, Robotics Auton. Syst..

[24]  Xavier Boyen,et al.  Tractable Inference for Complex Stochastic Processes , 1998, UAI.

[25]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[26]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[27]  Wolfram Burgard,et al.  Experiences with an Interactive Museum Tour-Guide Robot , 1999, Artif. Intell..

[28]  Craig Boutilier,et al.  Value-Directed Belief State Approximation for POMDPs , 2000, UAI.

[29]  Milos Hauskrecht,et al.  Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..

[30]  Weihong Zhang,et al.  Speeding Up the Convergence of Value Iteration in Partially Observable Markov Decision Processes , 2011, J. Artif. Intell. Res..

[31]  Eric A. Hansen,et al.  An Improved Grid-Based Approximation Algorithm for POMDPs , 2001, IJCAI.

[32]  N. Zhang,et al.  Algorithms for partially observable markov decision processes , 2001 .

[33]  Kin Man Poon,et al.  A fast heuristic algorithm for decision-theoretic planning , 2001 .

[34]  Wolfram Burgard,et al.  Robust Monte Carlo localization for mobile robots , 2001, Artif. Intell..

[35]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[36]  Blai Bonet,et al.  An epsilon-Optimal Grid-Based Algorithm for Partially Observable Markov Decision Processes , 2002, ICML.

[37]  M. Rosencrantz,et al.  Locating Moving Entities in Dynamic Indoor Environments with Teams of Mobile Robots , 2002 .

[38]  Nicholas Roy,et al.  Exponential Family PCA for Belief Compression in POMDPs , 2002, NIPS.

[39]  Martha E. Pollack,et al.  Planning Technology for Intelligent Cognitive Orthotics , 2002, AIPS.

[40]  Craig Boutilier,et al.  Value-Directed Compression of POMDPs , 2002, NIPS.

[41]  Joelle Pineau,et al.  Applying Metric-Trees to Belief-Point POMDPs , 2003, NIPS.

[42]  Craig Boutilier,et al.  Bounded Finite State Controllers , 2003, NIPS.

[43]  Sebastian Thrun,et al.  Locating moving entities in indoor environments with teams of mobile robots , 2003, AAMAS '03.

[44]  Sebastian Thrun,et al.  Perspectives on standardization in mobile robot programming: the Carnegie Mellon Navigation (CARMEN) Toolkit , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[45]  Joelle Pineau,et al.  Towards robotic assistants in nursing homes: Challenges and results , 2003, Robotics Auton. Syst..

[46]  N. Vlassis,et al.  A fast point-based algorithm for POMDPs , 2004 .

[47]  Reid G. Simmons,et al.  Heuristic Search Value Iteration for POMDPs , 2004, UAI.

[48]  Craig Boutilier,et al.  Stochastic Local Search for POMDP Controllers , 2004, AAAI.

[49]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[50]  Nikos A. Vlassis,et al.  Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..

[51]  David Madigan,et al.  Probabilistic Temporal Reasoning , 2005, Handbook of Temporal Reasoning in Artificial Intelligence.

[52]  Sébastien Paquet,et al.  Distributed Decision-Making and TaskCoordination in Dynamic, Uncertain andReal-Time Multiagent Environments , 2005 .

[53]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[54]  George E. Monahan,et al.  A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 2007 .