Geodesic Gaussian kernels for value function approximation

The least-squares policy iteration approach works efficiently in value function approximation, given appropriate basis functions. Because of its smoothness, the Gaussian kernel is a popular and useful choice as a basis function. However, it does not allow for discontinuity which typically arises in real-world reinforcement learning tasks. In this paper, we propose a new basis function based on geodesic Gaussian kernels, which exploits the non-linear manifold structure induced by the Markov decision processes. The usefulness of the proposed method is successfully demonstrated in simulated robot arm control and Khepera robot navigation.

[1]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[2]  Robert E. Tarjan,et al.  Fibonacci heaps and their uses in improved network optimization algorithms , 1984, JACM.

[3]  Ingrid Daubechies,et al.  Ten Lectures on Wavelets , 1992 .

[4]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[5]  Tomaso A. Poggio,et al.  Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[6]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[7]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[8]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[9]  Doina Precup,et al.  Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.

[10]  Jun Morimoto,et al.  Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning , 2000, Robotics Auton. Syst..

[11]  Stefan Schaal,et al.  Statistical Learning for Humanoid Robots , 2002, Auton. Robots.

[12]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[13]  R. Coifman,et al.  Diffusion Wavelets , 2004 .

[14]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[15]  Sridhar Mahadevan,et al.  Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions , 2005, NIPS.

[16]  Shie Mannor,et al.  Reinforcement learning with Gaussian processes , 2005, ICML.

[17]  Sridhar Mahadevan,et al.  Proto-value functions: developmental reinforcement learning , 2005, ICML.

[18]  Andrew V. Goldberg,et al.  Computing the shortest path: A search meets graph theory , 2005, SODA '05.

[19]  Arthur D. Szlam,et al.  Diffusion wavelet packets , 2006 .

[20]  Sridhar Mahadevan,et al.  Learning state-action basis functions for hierarchical MDPs , 2007, ICML '07.

[21]  Masashi Sugiyama,et al.  Value Function Approximation on Non-Linear Manifolds for Robot Motor Control , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[22]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[23]  Andrew Y. Ng,et al.  Learning omnidirectional path following using dimensionality reduction , 2007, Robotics: Science and Systems.

[24]  Masashi Sugiyama,et al.  Adaptive Importance Sampling with Automatic Model Selection in Value Function Approximation , 2007, AAAI.

[25]  U. Feige,et al.  Spectral Graph Theory , 2015 .