SARSOP: Efficient Point-Based POMDP Planning by Approximating Optimally Reachable Belief Spaces

IN Proc. Robotics: Science & Systems, 2008 Abstract—Motion planning in uncertain and dynamic environ- ments is an essential capability for autonomous robots. Partially observable Markov decision processes (POMDPs) provide a principled mathematical framework for solving such problems, but they are often avoided in robotics due to high computational complexity. Our goal is to create practical POMDP algorithms and software for common robotic tasks. To this end, we have developed a new point-based POMDP algorithm that exploits the notion of optimally reachable belief spaces to improve com- putational efficiency. In simulation, we successfully applied the algorithm to a set of common robotic tasks, including instances of coastal navigation, grasping, mobile robot exploration, and target tracking, all modeled as POMDPs with a large number of states. In most of the instances studied, our algorithm substantially outperformed one of the fastest existing point-based algorithms. A software package implementing our algorithm will soon be released at http://motion.comp.nus.edu.sg/ projects/pomdp/pomdp.html.

[1]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[2]  John N. Tsitsiklis,et al.  The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[3]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[4]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[5]  Anne Condon,et al.  On the Undecidability of Probabilistic Planning and Infinite-Horizon Partially Observable Markov Decision Problems , 1999, AAAI/IAAI.

[6]  Milos Hauskrecht,et al.  Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..

[7]  Alexei Makarenko,et al.  An experiment in integrated exploration , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[8]  Craig Boutilier,et al.  Value-Directed Compression of POMDPs , 2002, NIPS.

[9]  Joelle Pineau,et al.  Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[10]  Joelle Pineau,et al.  Towards robotic assistants in nursing homes: Challenges and results , 2003, Robotics Auton. Syst..

[11]  Reid G. Simmons,et al.  Heuristic Search Value Iteration for POMDPs , 2004, UAI.

[12]  Nikos A. Vlassis,et al.  A point-based POMDP algorithm for robot planning , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[13]  Geoffrey J. Gordon,et al.  Finding Approximate POMDP solutions Through Belief Compression , 2011, J. Artif. Intell. Res..

[14]  Guy Shani,et al.  Adaptation for Changing Stochastic Environments through Online POMDP Policy Learning , 2005 .

[15]  Reid G. Simmons,et al.  Point-Based POMDP Algorithms: Improved Analysis and Implementation , 2005, UAI.

[16]  Nan Rong,et al.  What makes some POMDP problems easy to approximate? , 2007, NIPS.

[17]  Guy Shani,et al.  Forward Search Value Iteration for POMDPs , 2007, IJCAI.

[18]  Jesse Hoey,et al.  Assisting persons with dementia during handwashing using a partially observable Markov decision process. , 2007, ICVS 2007.

[19]  Leslie Pack Kaelbling,et al.  Grasping POMDPs , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[20]  Nan Rong,et al.  A point-based POMDP planner for target tracking , 2008, 2008 IEEE International Conference on Robotics and Automation.