Markov Decision Processes (MDPs) and Partially Observable Markov Decision Processes (POMDPs) are very powerful frameworks to model decision and decision learning tasks in a wide range of problem domains. Thus, they are used widely in complex and real-world situations such as robot control tasks. However, this modeling power and generality of the framework comes at a cost in that the complexity of the underlying model and corresponding algorithms grows dramatically as the complexity of the task domain increases. To address this issue in the context of tasks where raw sensory features are used as a basis for complex decision making, this paper presents an integrated and adaptive approach that attempts to reduce the complexity of the decision learning problem by separating the POMDP model into separate decision and perceptual processes. In the proposed framework, a sampling method is used for the perceptual process and reinforcement learning serves to address the decision process. Handling the perceptual and decision processes separately here promises the potential to make it easier to extract relevant perceptual information and concentrate the decision process on relevant state attributes. This, in turn, promises to allow the framework to scale to problems in which traditional POMDP methods are intractable. We show and discuss the effectiveness of our method analytically and empirically.
[1]
Wolfram Burgard,et al.
Probabilistic Robotics (Intelligent Robotics and Autonomous Agents)
,
2005
.
[2]
D. Braziunas.
POMDP solution methods
,
2003
.
[3]
Joelle Pineau,et al.
Point-based value iteration: An anytime algorithm for POMDPs
,
2003,
IJCAI.
[4]
Sebastian Thrun,et al.
Monte Carlo POMDPs
,
1999,
NIPS.
[5]
Leslie Pack Kaelbling,et al.
Learning Policies for Partially Observable Environments: Scaling Up
,
1997,
ICML.
[6]
Alex Brooks,et al.
A Monte Carlo Update for Parametric POMDPs
,
2007,
ISRR.