Planning in partially-observable switching-mode continuous domains

Continuous-state POMDPs provide a natural representation for a variety of tasks, including many in robotics. However, most existing parametric continuous-state POMDP approaches are limited by their reliance on a single linear model to represent the world dynamics. We introduce a new switching-state dynamics model that can represent multi-modal state-dependent dynamics. We present the Switching Mode POMDP (SM-POMDP) planning algorithm for solving continuous-state POMDPs using this dynamics model. We also consider several procedures to approximate the value function as a mixture of a bounded number of Gaussians. Unlike the majority of prior work on approximate continuous-state POMDP planners, we provide a formal analysis of our SM-POMDP algorithm, providing bounds, where possible, on the quality of the resulting solution. We also analyze the computational complexity of SM-POMDP. Empirical results on an unmanned aerial vehicle collisions avoidance simulation, and a robot navigation simulation where the robot has faulty actuators, demonstrate the benefit of SM-POMDP over a prior parametric approach.

[1]  Reid G. Simmons,et al.  Point-Based POMDP Algorithms: Improved Analysis and Implementation , 2005, UAI.

[2]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[3]  Pascal Poupart,et al.  Point-Based Value Iteration for Continuous POMDPs , 2006, J. Mach. Learn. Res..

[4]  Reid G. Simmons,et al.  Heuristic Search Value Iteration for POMDPs , 2004, UAI.

[5]  Alexei Makarenko,et al.  Parametric POMDPs for planning in continuous state spaces , 2006, Robotics Auton. Syst..

[6]  John R. Hershey,et al.  Approximating the Kullback Leibler Divergence Between Gaussian Mixture Models , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[7]  Guy Shani,et al.  Forward Search Value Iteration for POMDPs , 2007, IJCAI.

[8]  Andrew W. Moore,et al.  Variable Resolution Discretization for High-Accuracy Solutions of Optimal Control Problems , 1999, IJCAI.

[9]  Jooyoung Park,et al.  Universal Approximation Using Radial-Basis-Function Networks , 1991, Neural Computation.

[10]  Joelle Pineau,et al.  Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[11]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[12]  Terrence J. Sejnowski,et al.  Variational Learning for Switching State-Space Models , 2001 .

[13]  Katie Byl,et al.  Dynamically diverse legged locomotion for rough terrain , 2009, 2009 IEEE International Conference on Robotics and Automation.

[14]  Michael C. Fu,et al.  Solving Continuous-State POMDPs via Density Projection , 2010, IEEE Transactions on Automatic Control.

[15]  James T. Kwok,et al.  Simplifying Mixture Models Through Function Approximation , 2006, IEEE Transactions on Neural Networks.

[16]  Sebastian Thrun,et al.  Monte Carlo POMDPs , 1999, NIPS.

[17]  S. Kullback,et al.  A lower bound for discrimination information in terms of variation (Corresp.) , 1967, IEEE Trans. Inf. Theory.

[18]  Brian C. Williams,et al.  Model learning for switching linear systems with autonomous mode transitions , 2007, 2007 46th IEEE Conference on Decision and Control.

[19]  Weihong Zhang,et al.  Speeding Up the Convergence of Value Iteration in Partially Observable Markov Decision Processes , 2011, J. Artif. Intell. Res..

[20]  Nikos A. Vlassis,et al.  Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..

[21]  Jacob Goldberger,et al.  Hierarchical Clustering of a Mixture Model , 2004, NIPS.

[22]  Michael I. Jordan,et al.  Nonparametric Bayesian Learning of Switching Linear Dynamical Systems , 2008, NIPS.

[23]  Brian D. O. Anderson,et al.  Linear Optimal Control , 1971 .

[24]  James M. Rehg,et al.  Data-Driven MCMC for Learning and Inference in Switching Linear Dynamic Systems , 2005, AAAI.

[25]  Huibert Kwakernaak,et al.  Linear Optimal Control Systems , 1972 .

[26]  Frank L. Lewis,et al.  Optimal Control , 1986 .