论文信息 - Planning in partially-observable switching-mode continuous domains

Planning in partially-observable switching-mode continuous domains

Continuous-state POMDPs provide a natural representation for a variety of tasks, including many in robotics. However, most existing parametric continuous-state POMDP approaches are limited by their reliance on a single linear model to represent the world dynamics. We introduce a new switching-state dynamics model that can represent multi-modal state-dependent dynamics. We present the Switching Mode POMDP (SM-POMDP) planning algorithm for solving continuous-state POMDPs using this dynamics model. We also consider several procedures to approximate the value function as a mixture of a bounded number of Gaussians. Unlike the majority of prior work on approximate continuous-state POMDP planners, we provide a formal analysis of our SM-POMDP algorithm, providing bounds, where possible, on the quality of the resulting solution. We also analyze the computational complexity of SM-POMDP. Empirical results on an unmanned aerial vehicle collisions avoidance simulation, and a robot navigation simulation where the robot has faulty actuators, demonstrate the benefit of SM-POMDP over a prior parametric approach.

[1] Reid G. Simmons,et al. Point-Based POMDP Algorithms: Improved Analysis and Implementation , 2005, UAI.

[2] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[3] Pascal Poupart,et al. Point-Based Value Iteration for Continuous POMDPs , 2006, J. Mach. Learn. Res..

[4] Reid G. Simmons,et al. Heuristic Search Value Iteration for POMDPs , 2004, UAI.

[5] Alexei Makarenko,et al. Parametric POMDPs for planning in continuous state spaces , 2006, Robotics Auton. Syst..

[6] John R. Hershey,et al. Approximating the Kullback Leibler Divergence Between Gaussian Mixture Models , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[7] Guy Shani,et al. Forward Search Value Iteration for POMDPs , 2007, IJCAI.

[8] Andrew W. Moore,et al. Variable Resolution Discretization for High-Accuracy Solutions of Optimal Control Problems , 1999, IJCAI.

[9] Jooyoung Park,et al. Universal Approximation Using Radial-Basis-Function Networks , 1991, Neural Computation.

[10] Joelle Pineau,et al. Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[11] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[12] Terrence J. Sejnowski,et al. Variational Learning for Switching State-Space Models , 2001 .

[13] Katie Byl,et al. Dynamically diverse legged locomotion for rough terrain , 2009, 2009 IEEE International Conference on Robotics and Automation.

[14] Michael C. Fu,et al. Solving Continuous-State POMDPs via Density Projection , 2010, IEEE Transactions on Automatic Control.

[15] James T. Kwok,et al. Simplifying Mixture Models Through Function Approximation , 2006, IEEE Transactions on Neural Networks.

[16] Sebastian Thrun,et al. Monte Carlo POMDPs , 1999, NIPS.

[17] S. Kullback,et al. A lower bound for discrimination information in terms of variation (Corresp.) , 1967, IEEE Trans. Inf. Theory.

[18] Brian C. Williams,et al. Model learning for switching linear systems with autonomous mode transitions , 2007, 2007 46th IEEE Conference on Decision and Control.

[19] Weihong Zhang,et al. Speeding Up the Convergence of Value Iteration in Partially Observable Markov Decision Processes , 2011, J. Artif. Intell. Res..

[20] Nikos A. Vlassis,et al. Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..

[21] Jacob Goldberger,et al. Hierarchical Clustering of a Mixture Model , 2004, NIPS.

[22] Michael I. Jordan,et al. Nonparametric Bayesian Learning of Switching Linear Dynamical Systems , 2008, NIPS.

[23] Brian D. O. Anderson,et al. Linear Optimal Control , 1971 .

[24] James M. Rehg,et al. Data-Driven MCMC for Learning and Inference in Switching Linear Dynamic Systems , 2005, AAAI.

[25] Huibert Kwakernaak,et al. Linear Optimal Control Systems , 1972 .

[26] Frank L. Lewis,et al. Optimal Control , 1986 .