An Heuristic for Multi-Dimensional Markov Decision Processes

Abstract An heuristic procedure is presented for multi-dimensional Markov decision processes where current approximating optimal procedures are computationally demanding. It is applicable when certain simple policies are easy to evaluate, and it is not necessary to evaluate the improved policies. Bounds on the loss of optimality arising from such policies arc given, which are used to reject or accept any policy derived.