Fast direct policy evaluation using multiscale analysis of Markov diffusion processes

Policy evaluation is a critical step in the approximate solution of large Markov decision processes (MDPs), typically requiring <i>O</i>(|<i>S</i>|<sup>3</sup>) to directly solve the Bellman system of |<i>S</i>| linear equations (where |<i>S</i>| is the state space size in the discrete case, and the sample size in the continuous case). In this paper we apply a recently introduced multiscale framework for analysis on graphs to design a faster algorithm for policy evaluation. For a fixed policy π, this framework efficiently constructs a multiscale decomposition of the random walk <i>P</i>π associated with the policy π. This enables efficiently computing medium and long term state distributions, approximation of value functions, and the <i>direct</i> computation of the potential operator (<i>I</i> - γ<i>P</i><sup>π</sup>)<sup>-1</sup> needed to solve Bellman's equation. We show that even a preliminary non-optimized version of the solver competes with highly optimized iterative techniques, requiring in many cases a complexity of <i>O</i>(|<i>S</i>|).