Purely Epistemic Markov Decision Processes

Planning under uncertainty involves two distinct sources of uncertainty: uncertainty about the effects of actions and uncertainty about the current state of the world. The most widely developed model that deals with both sources of uncertainty is that of Partially Observable Markov Decision Processes (POMDPs). Simplifying POMDPs by getting rid of the second source of uncertainty leads to the well-known framework of fully observable MDPs. Getting rid of the first source of uncertainty leads to a less widely studied framework, namely, decision processes where actions cannot change the state of the world and are only intended to bring some information about the (static) state of the world. Such "purely epistemic" processes are very relevant, since many practical problems (such as diagnosis, database querying, or preference elicitation) fall into this class. However, it is not known whether this specific restriction of POMDP is computationally simpler than POMDPs. In this paper we establish several complexity results for purely epistemic MDPs (EMDPs). We first show that short-horizon policy existence in EMDPs is PSPACE-complete. Then we focus on the specific case of EMDPs with reliable observations and show that in this case, policy existence is "only" NP-complete; however, we show that this problem cannot be approximated with a bounded performance ratio by a polynomial-time algorithm.