DESPOT: Online POMDP Planning with Regularization

POMDPs provide a principled framework for planning under uncertainty, but are computationally intractable, due to the "curse of dimensionality" and the "curse of history". This paper presents an online POMDP algorithm that alleviates these difficulties by focusing the search on a set of randomly sampled scenarios. A Determinized Sparse Partially Observable Tree (DESPOT) compactly captures the execution of all policies on these scenarios. Our Regularized DESPOT (R-DESPOT) algorithm searches the DESPOT for a policy, while optimally balancing the size of the policy and its estimated value obtained under the sampled scenarios. We give an output-sensitive performance bound for all policies derived from a DESPOT, and show that R-DESPOT works well if a small optimal policy exists. We also give an anytime algorithm that approximates R-DESPOT. Experiments show strong results, compared with two of the fastest online POMDP algorithms. Source code along with experimental settings are available at http://bigbird.comp.nus.edu.sg/pmwiki/farm/appl/.

[1]  Reid G. Simmons,et al.  Point-Based POMDP Algorithms: Improved Analysis and Implementation , 2005, UAI.

[2]  J. Satia,et al.  Markovian Decision Processes with Probabilistic Observation of States , 1973 .

[3]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[4]  Yishay Mansour,et al.  A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.

[5]  Dimitri P. Bertsekas,et al.  Rollout Algorithms for Stochastic Scheduling Problems , 1999, J. Heuristics.

[6]  Joelle Pineau,et al.  Online Planning Algorithms for POMDPs , 2008, J. Artif. Intell. Res..

[7]  Subbarao Kambhampati,et al.  Probabilistic Planning via Determinization in Hindsight , 2008, AAAI.

[8]  Robert Givan,et al.  A framework for simulation-based network control via hindsight optimization , 2000, Proceedings of the 39th IEEE Conference on Decision and Control (Cat. No.00CH37187).

[9]  Reid G. Simmons,et al.  Heuristic Search Value Iteration for POMDPs , 2004, UAI.

[10]  Joel Veness,et al.  Monte-Carlo Planning in Large POMDPs , 2010, NIPS.

[11]  David K. Smith,et al.  Dynamic Programming and Optimal Control. Volume 1 , 1996 .

[12]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[13]  David Silver,et al.  Combining online and offline knowledge in UCT , 2007, ICML '07.

[14]  David Hsu,et al.  Intention-aware online POMDP planning for autonomous driving in a crowd , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[15]  David A. McAllester,et al.  Approximate Planning for Factored POMDPs using Belief State Simplification , 1999, UAI.

[16]  Weihong Zhang,et al.  Speeding Up the Convergence of Value Iteration in Partially Observable Markov Decision Processes , 2011, J. Artif. Intell. Res..

[17]  Nikos A. Vlassis,et al.  Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..

[18]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[19]  Michael I. Jordan,et al.  PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[20]  Judy Goldsmith,et al.  Nonapproximability Results for Partially Observable Markov Decision Processes , 2011, Universität Trier, Mathematik/Informatik, Forschungsbericht.

[21]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control 3rd Edition, Volume II , 2010 .

[22]  Olivier Buffet,et al.  MOMDPs: A Solution for Modelling Adaptive Management Problems , 2012, AAAI.

[23]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[24]  Anne Condon,et al.  On the Undecidability of Probabilistic Planning and Infinite-Horizon Partially Observable Markov Decision Problems , 1999, AAAI/IAAI.

[25]  David Hsu,et al.  Monte Carlo Bayesian Reinforcement Learning , 2012, ICML.

[26]  Wolfram Burgard,et al.  Coastal navigation-mobile robot navigation with uncertainty in dynamic environments , 1999, Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C).

[27]  David Hsu,et al.  Monte Carlo Value Iteration for Continuous-State POMDPs , 2010, WAFR.

[28]  David Hsu,et al.  Planning under Uncertainty for Robotic Tasks with Mixed Observability , 2010, Int. J. Robotics Res..

[29]  Brahim Chaib-draa,et al.  An online POMDP algorithm for complex multiagent environments , 2005, AAMAS '05.

[30]  Rémi Munos,et al.  Bandit Algorithms for Tree Search , 2007, UAI.

[31]  John N. Tsitsiklis,et al.  The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[32]  Milos Hauskrecht,et al.  Planning treatment of ischemic heart disease with partially observable Markov decision processes , 2000, Artif. Intell. Medicine.

[33]  A. Cayley A theorem on trees , 2009 .

[34]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[35]  N. Gordon,et al.  Novel approach to nonlinear/non-Gaussian Bayesian state estimation , 1993 .

[36]  Nicholas Roy,et al.  Efficient Planning under Uncertainty with Macro-actions , 2014, J. Artif. Intell. Res..

[37]  Joelle Pineau,et al.  Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[38]  Yishay Mansour,et al.  Approximate Planning in Large POMDPs via Reusable Trajectories , 1999, NIPS.

[39]  Brahim Chaib-draa,et al.  AEMS: An Anytime Online Search Algorithm for Approximate Policy Refinement in Large POMDPs , 2007, IJCAI.

[40]  David Hsu,et al.  SARSOP: Efficient Point-Based POMDP Planning by Approximating Optimally Reachable Belief Spaces , 2008, Robotics: Science and Systems.

[41]  Raj Madhavan,et al.  2015 Humanitarian Robotics and Automation Technology Challenge , 2015 .

[42]  P. Poupart Exploiting structure to efficiently solve large scale partially observable Markov decision processes , 2005 .