论文信息 - Learning Representation and Control in Continuous Markov Decision Processes

Learning Representation and Control in Continuous Markov Decision Processes

This paper presents a novel framework for simultaneously learning representation and control in continuous Markov decision processes. Our approach builds on the framework of proto-value functions, in which the underlying representation or basis functions are automatically derived from a spectral analysis of the state space manifold. The proto-value functions correspond to the eigenfunctions of the graph Laplacian. We describe an approach to extend the eigenfunctions to novel states using the Nystrom extension. A least-squares policy iteration method is used to learn the control policy, where the underlying subspace for approximating the value function is spanned by the learned proto-value functions. A detailed set of experiments is presented using classic benchmark tasks, including the inverted pendulum and the mountain car, showing the sensitivity in performance to various parameters, and including comparisons with a parametric radial basis function method.

[1] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[2] Sridhar Mahadevan,et al. Proto-value functions: developmental reinforcement learning , 2005, ICML.

[3] R. Taylor,et al. The Numerical Treatment of Integral Equations , 1978 .

[4] G. Micula,et al. Numerical Treatment of the Integral Equations , 1999 .

[5] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[6] Thomas G. Dietterich. Machine-Learning Research , 1997, AI Mag..

[7] John Platt,et al. FastMap, MetricMap, and Landmark MDS are all Nystrom Algorithms , 2005, AISTATS.

[8] S. Rosenberg. The Laplacian on a Riemannian Manifold: The Laplacian on a Riemannian Manifold , 1997 .

[9] Mikhail Belkin,et al. Semi-Supervised Learning on Riemannian Manifolds , 2004, Machine Learning.

[10] Jitendra Malik,et al. Spectral Partitioning with Indefinite Kernels Using the Nyström Extension , 2002, ECCV.

[11] Matthias W. Seeger,et al. Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[12] Petros Drineas,et al. On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[13] S. Rosenberg. The Laplacian on a Riemannian Manifold: The Construction of the Heat Kernel , 1997 .

[14] Sridhar Mahadevan,et al. Fast direct policy evaluation using multiscale analysis of Markov diffusion processes , 2006, ICML.

[15] Alan M. Frieze,et al. Fast monte-carlo algorithms for finding low-rank approximations , 2004, JACM.

[16] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[17] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[18] Pietro Perona,et al. Self-Tuning Spectral Clustering , 2004, NIPS.