Robot Planning in Partially Observable Continuous Domains

o We present a value iteration algorithm for learn- ing to act in Partially Observable Markov Decision Processes (POMDPs) with continuous state spaces. Mainstream POMDP research focuses on the discrete case and this complicates its application to, e.g., robotic problems that are naturally modeled using continuous state spaces. The main difculty in dening a (belief-based) POMDP in a continuous state space is that expected values over states must be dened using integrals that, in general, cannot be computed in closed from. In this paper, we provide three main contributions to the literature on continuous- state POMDPs. First, we show that the optimal nite-horizon value function over the continuous innite-dimension al POMDP belief space is piecewise linear and convex, and is dened by a nite set of supporting -functions that are analogous to the -vectors (hyperplanes) dening the value function of a discrete- state POMDP. Second, we show that, for a fairly general class of POMDP models in which all functions of interest are modeled by Gaussian mixtures, all belief updates and value iteration backups can be carried out analytically and exact. Contrary to the discrete case, in a continuous-state POMDP the -functions may grow in size (e.g., in the number of Gaussian components) in each value iteration. Third, we show how the recent point-based value iteration algorithms for discrete POMDPs can be extended to the continuous case, allowing for efcient planning in practical problems. In particular, we demonstrate Perseus, our previously proposed randomized point-based value iteration algorithm, in a simple robot planning problem in a continuous domain, where encouraging results are observed.

[1]  Geoffrey J. Gordon,et al.  Finding Approximate POMDP solutions Through Belief Compression , 2011, J. Artif. Intell. Res..

[2]  John N. Tsitsiklis,et al.  The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[3]  George E. Monahan,et al.  A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 2007 .

[4]  Michael I. Jordan,et al.  PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[5]  Anne Condon,et al.  On the Undecidability of Probabilistic Planning and Infinite-Horizon Partially Observable Markov Decision Problems , 1999, AAAI/IAAI.

[6]  R. Bellman Dynamic programming. , 1957, Science.

[7]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[8]  N. Zhang,et al.  Algorithms for partially observable markov decision processes , 2001 .

[9]  G. Monahan State of the Art—A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 1982 .

[10]  Hugh F. Durrant-Whyte,et al.  Mobile robot localization by tracking geometric beacons , 1991, IEEE Trans. Robotics Autom..

[11]  Jeffrey K. Uhlmann,et al.  New extension of the Kalman filter to nonlinear systems , 1997, Defense, Security, and Sensing.

[12]  Michael L. Littman,et al.  Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.

[13]  Milos Hauskrecht,et al.  Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..

[14]  Douglas Aberdeen,et al.  Scalable Internal-State Policy-Gradient Methods for POMDPs , 2002, ICML.

[15]  Sridhar Mahadevan,et al.  Approximate planning with hierarchical partially observable Markov decision process models for robot navigation , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[16]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[17]  Joelle Pineau,et al.  Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[18]  Jesse Hoey,et al.  Solving POMDPs with Continuous or Large Discrete Observation Spaces , 2005, IJCAI.

[19]  Edward J. Sondik,et al.  The optimal control of par-tially observable Markov processes , 1971 .

[20]  Patric Jensfelt,et al.  Active global localization for a mobile robot using multiple hypothesis tracking , 2001, IEEE Trans. Robotics Autom..

[21]  J. M. Porta,et al.  Value iteration for continuous-state POMDPs , 2004 .

[22]  N. Vlassis,et al.  A fast point-based algorithm for POMDPs , 2004 .

[23]  Joelle Pineau,et al.  Towards robotic assistants in nursing homes: Challenges and results , 2003, Robotics Auton. Syst..

[24]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[25]  Jacob Goldberger,et al.  Hierarchical Clustering of a Mixture Model , 2004, NIPS.

[26]  Reid G. Simmons,et al.  Probabilistic Robot Navigation in Partially Observable Environments , 1995, IJCAI.

[27]  Leslie Pack Kaelbling,et al.  Acting under uncertainty: discrete Bayesian models for mobile-robot navigation , 1996, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. IROS '96.

[28]  Nikos A. Vlassis,et al.  Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..

[29]  Sebastian Thrun,et al.  Monte Carlo POMDPs , 1999, NIPS.