NMRDPP : Decision-Theoretic Planning with Control Knowledge

We discuss NMRDPP, a system for solving decision processes with non-Markovian reward. More specifically, target decision processes exhibit Markovian dynamics and rewarding behaviours are modelled as state trajectories specified in a linear temporal logic. In addition to implementing structured, tabular and online MDP solution algorithms, NMRDPP can exploit domain specific control knowledge. State trajectories which violate the users knowledge/intuition regarding useful dynamics can be pruned from consideration by the MDP solution algorithm. Thus, in addition to facilitating concise specification of complex reward structures, NMRDPP can be used to greatly speed up policy computation for propositional MDPs. To our knowledge, NMRDPP is the only implementation of solution algorithms designed to solve decision processes with non-Markovian rewards.