Exact and approximate algorithms for partially observable markov decision processes
暂无分享,去创建一个
Automated sequential decision making is crucial in many contexts. In the face of uncertainty, this task becomes even more important, though at the same time, computing optimal decision policies becomes more complex. The more sources of uncertainty there are, the harder the problem becomes to solve. In this work, we look at sequential decision making in environments where the actions have probabilistic outcomes and in which the system state is only partially observable. We focus on using a model called a partially observable Markov decision process (POMDP) and explore algorithms which address computing both optimal and approximate policies for use in controlling processes that are modeled using POMDPs.
Although solving for the optimal policy is PSPACE-complete (or worse), the study and improvements of exact algorithms lends insight into the optimal solution structure as well as providing a basis for approximate solutions. We present some improvements, analysis and empirical comparisons for some existing and some novel approaches for computing the optimal POMDP policy exactly.
Since it is also hard (NP-complete or worse) to derive close approximations to the optimal solution for POMDPs, we consider a number of approaches for deriving policies that yield sub-optimal control and empirically explore their performance on a range of problems. These approaches borrow and extend ideas from a number of areas; from the more mathematically motivated techniques in reinforcement learning and control theory to entirely heuristic control rules.