论文信息 - A Sampling-Based Approach to Reducing the Complexity of Continuous State Space POMDPs by Decomposition Into Coupled Perceptual and Decision Processes

A Sampling-Based Approach to Reducing the Complexity of Continuous State Space POMDPs by Decomposition Into Coupled Perceptual and Decision Processes

In this paper, we propose a method to reduce the complexity of solving POMDPs in continuous state spaces by decomposing them into separate, coupled perceptual and decision processes which leads to a reduction of the state space size of the decision learning problem. In our method, we reduce the state space of the POMDP by handling some aspects of the state space outside of the decision POMDP. To achieve this, the whole problem state space is decomposed into separate state spaces for the decision and perceptual process. The Perceptual process just serves to estimate aspects of the belief state while the decision process estimates the remainder and determines a policy. As a result, the decision process is modeled as a reduced state space POMDP. To allow the application of this method to continuous state spaces, the decision and the perceptual processes are here both handled by a sampling method within which this separation makes it possible to represent the POMDP with a smaller state space which leads to smaller sample sets for the decision POMDP and as a result to reduced representational and decision learning complexity. The goal here is to focus decision learning on the aspects of the space that are important for decision making while the observations and attributes that are important for estimating the state of the decision process are handled separately by the perceptual process. In this way, the separation into different processes can significantly reduce the complexity of decision learning. In the proposed framework and algorithm, Monte Carlo based sampling methods and corresponding sample set representations are used for both the perceptual and decision processes to be able to deal efficiently with continuous domains. We show analytically and experimentally how much the complexity of solving a POMDP can be reduced to increase the range of decision learning tasks that can be addressed.

Manfred Huber | Rasool Fakoor | M. Huber | Rasool Fakoor

[1] Milos Hauskrecht,et al. Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..

[2] Alex Brooks,et al. A Monte Carlo Update for Parametric POMDPs , 2007, ISRR.

[3] Leslie Pack Kaelbling,et al. Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[4] Leslie Pack Kaelbling,et al. Continuous-State POMDPs with Hybrid Dynamics , 2008, ISAIM.

[5] Sebastian Thrun,et al. Monte Carlo POMDPs , 1999, NIPS.

[6] Alexei Makarenko,et al. Parametric POMDPs for planning in continuous state spaces , 2006, Robotics Auton. Syst..

[7] Geoffrey J. Gordon,et al. Finding Approximate POMDP solutions Through Belief Compression , 2011, J. Artif. Intell. Res..

[8] D. Braziunas. POMDP solution methods , 2003 .

[9] Pascal Poupart,et al. Point-Based Value Iteration for Continuous POMDPs , 2006, J. Mach. Learn. Res..

[10] Michael C. Fu,et al. Solving Continuous-State POMDPs via Density Projection , 2010, IEEE Transactions on Automatic Control.

[11] Wolfram Burgard,et al. Probabilistic Robotics (Intelligent Robotics and Autonomous Agents) , 2005 .