论文信息 - Geodesic Gaussian kernels for value function approximation

Geodesic Gaussian kernels for value function approximation

The least-squares policy iteration approach works efficiently in value function approximation, given appropriate basis functions. Because of its smoothness, the Gaussian kernel is a popular and useful choice as a basis function. However, it does not allow for discontinuity which typically arises in real-world reinforcement learning tasks. In this paper, we propose a new basis function based on geodesic Gaussian kernels, which exploits the non-linear manifold structure induced by the Markov decision processes. The usefulness of the proposed method is successfully demonstrated in simulated robot arm control and Khepera robot navigation.

[1] Edsger W. Dijkstra,et al. A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[2] Robert E. Tarjan,et al. Fibonacci heaps and their uses in improved network optimization algorithms , 1984, JACM.

[3] Ingrid Daubechies,et al. Ten Lectures on Wavelets , 1992 .

[4] Heekuck Oh,et al. Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[5] Tomaso A. Poggio,et al. Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[6] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[7] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[8] Alexander J. Smola,et al. Learning with kernels , 1998 .

[9] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.

[10] Jun Morimoto,et al. Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning , 2000, Robotics Auton. Syst..

[11] Stefan Schaal,et al. Statistical Learning for Humanoid Robots , 2002, Auton. Robots.

[12] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[13] R. Coifman,et al. Diffusion Wavelets , 2004 .

[14] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[15] Sridhar Mahadevan,et al. Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions , 2005, NIPS.

[16] Shie Mannor,et al. Reinforcement learning with Gaussian processes , 2005, ICML.

[17] Sridhar Mahadevan,et al. Proto-value functions: developmental reinforcement learning , 2005, ICML.

[18] Andrew V. Goldberg,et al. Computing the shortest path: A search meets graph theory , 2005, SODA '05.

[19] Arthur D. Szlam,et al. Diffusion wavelet packets , 2006 .

[20] Sridhar Mahadevan,et al. Learning state-action basis functions for hierarchical MDPs , 2007, ICML '07.

[21] Masashi Sugiyama,et al. Value Function Approximation on Non-Linear Manifolds for Robot Motor Control , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[22] M. V. Velzen,et al. Self-organizing maps , 2007 .

[23] Andrew Y. Ng,et al. Learning omnidirectional path following using dimensionality reduction , 2007, Robotics: Science and Systems.

[24] Masashi Sugiyama,et al. Adaptive Importance Sampling with Automatic Model Selection in Value Function Approximation , 2007, AAAI.

[25] U. Feige,et al. Spectral Graph Theory , 2015 .