Learning Representation and Control in Continuous Markov Decision Processes

This paper presents a novel framework for simultaneously learning representation and control in continuous Markov decision processes. Our approach builds on the framework of proto-value functions, in which the underlying representation or basis functions are automatically derived from a spectral analysis of the state space manifold. The proto-value functions correspond to the eigenfunctions of the graph Laplacian. We describe an approach to extend the eigenfunctions to novel states using the Nystrom extension. A least-squares policy iteration method is used to learn the control policy, where the underlying subspace for approximating the value function is spanned by the learned proto-value functions. A detailed set of experiments is presented using classic benchmark tasks, including the inverted pendulum and the mountain car, showing the sensitivity in performance to various parameters, and including comparisons with a parametric radial basis function method.

[1]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[2]  Sridhar Mahadevan,et al.  Proto-value functions: developmental reinforcement learning , 2005, ICML.

[3]  R. Taylor,et al.  The Numerical Treatment of Integral Equations , 1978 .

[4]  G. Micula,et al.  Numerical Treatment of the Integral Equations , 1999 .

[5]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[6]  Thomas G. Dietterich Machine-Learning Research , 1997, AI Mag..

[7]  John Platt,et al.  FastMap, MetricMap, and Landmark MDS are all Nystrom Algorithms , 2005, AISTATS.

[8]  S. Rosenberg The Laplacian on a Riemannian Manifold: The Laplacian on a Riemannian Manifold , 1997 .

[9]  Mikhail Belkin,et al.  Semi-Supervised Learning on Riemannian Manifolds , 2004, Machine Learning.

[10]  Jitendra Malik,et al.  Spectral Partitioning with Indefinite Kernels Using the Nyström Extension , 2002, ECCV.

[11]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[12]  Petros Drineas,et al.  On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[13]  S. Rosenberg The Laplacian on a Riemannian Manifold: The Construction of the Heat Kernel , 1997 .

[14]  Sridhar Mahadevan,et al.  Fast direct policy evaluation using multiscale analysis of Markov diffusion processes , 2006, ICML.

[15]  Alan M. Frieze,et al.  Fast monte-carlo algorithms for finding low-rank approximations , 2004, JACM.

[16]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[17]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[18]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.