论文信息 - Fast direct policy evaluation using multiscale analysis of Markov diffusion processes

Fast direct policy evaluation using multiscale analysis of Markov diffusion processes

Policy evaluation is a critical step in the approximate solution of large Markov decision processes (MDPs), typically requiring O(|S|3) to directly solve the Bellman system of |S| linear equations (where |S| is the state space size in the discrete case, and the sample size in the continuous case). In this paper we apply a recently introduced multiscale framework for analysis on graphs to design a faster algorithm for policy evaluation. For a fixed policy π, this framework efficiently constructs a multiscale decomposition of the random walk Pπ associated with the policy π. This enables efficiently computing medium and long term state distributions, approximation of value functions, and the direct computation of the potential operator (I - γPπ)-1 needed to solve Bellman's equation. We show that even a preliminary non-optimized version of the solver competes with highly optimized iterative techniques, requiring in many cases a complexity of O(|S|).

Sridhar Mahadevan | Mauro Maggioni | S. Mahadevan | M. Maggioni

[1] Mikhail Belkin,et al. Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[2] Leslie Greengard,et al. A fast algorithm for particle simulations , 1987 .

[3] Sridhar Mahadevan,et al. Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions , 2005, NIPS.

[4] Arthur D. Szlam,et al. Diffusion wavelet packets , 2006 .

[5] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[6] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[7] R. Coifman,et al. Diffusion Wavelets , 2004 .

[8] J. Kemeny,et al. Denumerable Markov chains , 1969 .

[9] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[10] Sridhar Mahadevan,et al. Representation Policy Iteration , 2005, UAI.

[11] Jitendra Malik,et al. Efficient spatiotemporal grouping using the Nystrom method , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[12] A. Singer. From graph to manifold Laplacian: The convergence rate , 2006 .