Manifold-based non-parametric learning of action-value functions

Finding good approximations to state-action value functions is a central problem in model-free on-line reinforcement learning. The use of non-parametric function approximators enables us to simultaneously represent model and confidence. Since Q functions are usually discontinu- ous, we present a novel Gaussian process (GP) kernel function to cope with discontinuity. We use a manifold-based distance measure in our kernels, the manifold being induced by the graph structure extracted from data. Using on-line learning, the graph formation is parallel with the estimation algorithm. This results in a compact and efficient graph structure, elimi- nates the need for predefined function class and improves the accuracy of the estimated value functions, as tested on simulated robotic control tasks.