The Complexity of Deterministically Observable Finite-horizon Markov Decision Processes

We consider the complexity of the decision problem for diierent types of partially-observable Markov decision processes (MDPs): given an MDP, does there exist a policy with performance > 0? Lower and upper bounds on the complexity of the decision problems are shown in terms of completeness for NL, P, NP, PSPACE, EXP, NEXP or EXPSPACE, dependent on the type of the Markov decision process. For several NP-complete types, we show that they are not even polynomial time "-approximable for any xed ", unless P = NP. These results also reveal interesting trade-oos between the power of policies, observations, and rewards. Topics: computational and structural complexity, computational issues in A.I.