论文信息 - Proto-value functions: developmental reinforcement learning

Proto-value functions: developmental reinforcement learning

This paper presents a novel framework called proto-reinforcement learning (PRL), based on a mathematical model of a proto-value function: these are task-independent basis functions that form the building blocks of all value functions on a given state space manifold. Proto-value functions are learned not from rewards, but instead from analyzing the topology of the state space. Formally, proto-value functions are Fourier eigenfunctions of the Laplace-Beltrami diffusion operator on the state space manifold. Proto-value functions facilitate structural decomposition of large state spaces, and form geodesically smooth orthonormal basis functions for approximating any value function. The theoretical basis for proto-value functions combines insights from spectral graph theory, harmonic analysis, and Riemannian manifolds. Proto-value functions enable a novel generation of algorithms called representation policy iteration, unifying the learning of representation and behavior.

Sridhar Mahadevan | S. Mahadevan

[1] S. Axler,et al. Harmonic Function Theory , 1992 .

[2] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[3] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[4] S. Rosenberg. The Laplacian on a Riemannian Manifold: The Construction of the Heat Kernel , 1997 .

[5] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[6] Daphne Koller,et al. Policy Iteration for Factored MDPs , 2000, UAI.

[7] Shie Mannor,et al. Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning , 2002, ECML.

[8] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[9] Jitendra Malik,et al. Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10] Alan M. Frieze,et al. Fast monte-carlo algorithms for finding low-rank approximations , 2004, JACM.

[11] R. Coifman,et al. Diffusion Wavelets , 2004 .

[12] Alicia P. Wolfe,et al. Identifying useful subgoals in reinforcement learning by local graph partitioning , 2005, ICML.

[13] Sridhar Mahadevan,et al. Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions , 2005, NIPS.

[14] Alicia P. Wolfe,et al. Local Graph Partitioning as a Basis for Generating Temporally-Extended Actions in Reinforcement Learning , 2005 .