Point-based approximations for fast POMDP solving

The Partially Observable Markov Decision Process has long been recognized as a rich framework for real-world planning and control problems, especially in robotics. However exact solutions in this framework are typically computationally intractable for all but the smallest problems. Furthermore, until recently, the efficient approximations that were available offered few theoretical guarantees regarding their performance. This paper describes a new class of approximate POMDP algorithms, called Point-Based Value Iteration (PBVI), which feature both good empirical performance (in simulation and robotic tasks), as well as solid theoretical guarantees.

[1]  Edward J. Sondik,et al.  The optimal control of par-tially observable Markov processes , 1971 .

[2]  Richard Fikes,et al.  STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[3]  David Chapman,et al.  Planning for Conjunctive Goals , 1987, Artif. Intell..

[4]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[5]  David A. McAllester,et al.  Systematic Nonlinear Planning , 1991, AAAI.

[6]  Chelsea C. White,et al.  A survey of solution techniques for the partially observed Markov decision process , 1991, Ann. Oper. Res..

[7]  William S. Lovejoy,et al.  Computationally Feasible Bounds for Partially Observed Markov Decision Processes , 1991, Oper. Res..

[8]  W. Lovejoy A survey of algorithmic methods for partially observed Markov decision processes , 1991 .

[9]  Daniel S. Weld,et al.  UCPOP: A Sound, Complete, Partial Order Planner for ADL , 1992, KR.

[10]  Avrim Blum,et al.  Fast Planning Through Planning Graph Analysis , 1995, IJCAI.

[11]  Leslie Pack Kaelbling,et al.  Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[12]  Wenju Liu,et al.  Planning in Stochastic Domains: Problem Characteristics and Approximation , 1996 .

[13]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[14]  Michael L. Littman,et al.  Algorithms for Sequential Decision Making , 1996 .

[15]  Michael L. Littman,et al.  Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.

[16]  Ronen I. Brafman,et al.  A Heuristic Variable Grid Solution Method for POMDPs , 1997, AAAI/IAAI.

[17]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[18]  Kenneth M. Dawson-Howe,et al.  The application of robotics to a mobility aid for the elderly blind , 1998, Robotics Auton. Syst..

[19]  Xavier Boyen,et al.  Tractable Inference for Complex Stochastic Processes , 1998, UAI.

[20]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[21]  Wolfram Burgard,et al.  Experiences with an Interactive Museum Tour-Guide Robot , 1999, Artif. Intell..

[22]  Craig Boutilier,et al.  Value-Directed Belief State Approximation for POMDPs , 2000, UAI.

[23]  Milos Hauskrecht,et al.  Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..

[24]  Weihong Zhang,et al.  Speeding Up the Convergence of Value Iteration in Partially Observable Markov Decision Processes , 2011, J. Artif. Intell. Res..

[25]  Eric A. Hansen,et al.  An Improved Grid-Based Approximation Algorithm for POMDPs , 2001, IJCAI.

[26]  N. Zhang,et al.  Algorithms for partially observable markov decision processes , 2001 .

[27]  Kin Man Poon,et al.  A fast heuristic algorithm for decision-theoretic planning , 2001 .

[28]  Wolfram Burgard,et al.  Robust Monte Carlo localization for mobile robots , 2001, Artif. Intell..

[29]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[30]  M. Rosencrantz,et al.  Locating Moving Entities in Dynamic Indoor Environments with Teams of Mobile Robots , 2002 .

[31]  Nicholas Roy,et al.  Exponential Family PCA for Belief Compression in POMDPs , 2002, NIPS.

[32]  Martha E. Pollack,et al.  Planning Technology for Intelligent Cognitive Orthotics , 2002, AIPS.

[33]  Craig Boutilier,et al.  Value-Directed Compression of POMDPs , 2002, NIPS.

[34]  Joelle Pineau,et al.  Applying Metric-Trees to Belief-Point POMDPs , 2003, NIPS.

[35]  Joelle Pineau,et al.  Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[36]  Craig Boutilier,et al.  Bounded Finite State Controllers , 2003, NIPS.

[37]  Sebastian Thrun,et al.  Perspectives on standardization in mobile robot programming: the Carnegie Mellon Navigation (CARMEN) Toolkit , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[38]  Joelle Pineau,et al.  Towards robotic assistants in nursing homes: Challenges and results , 2003, Robotics Auton. Syst..

[39]  N. Vlassis,et al.  A fast point-based algorithm for POMDPs , 2004 .

[40]  Reid G. Simmons,et al.  Heuristic Search Value Iteration for POMDPs , 2004, UAI.

[41]  Craig Boutilier,et al.  Stochastic Local Search for POMDP Controllers , 2004, AAAI.

[42]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[43]  David Madigan,et al.  Probabilistic Temporal Reasoning , 2005, Handbook of Temporal Reasoning in Artificial Intelligence.

[44]  Geoffrey J. Gordon,et al.  Finding Approximate POMDP solutions Through Belief Compression , 2011, J. Artif. Intell. Res..

[45]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[46]  George E. Monahan,et al.  A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 2007 .