论文信息 - Manifold Regularization for Kernelized LSTD

Manifold Regularization for Kernelized LSTD

Policy evaluation or value function or Q-function approximation is a key procedure in reinforcement learning (RL). It is a necessary component of policy iteration and can be used for variance reduction in policy gradient methods. Therefore its quality has a significant impact on most RL algorithms. Motivated by manifold regularized learning, we propose a novel kernelized policy evaluation method that takes advantage of the intrinsic geometry of the state space learned from data, in order to achieve better sample efficiency and higher accuracy in Q-function approximation. Applying the proposed method in the Least-Squares Policy Iteration (LSPI) framework, we observe superior performance compared to widely used parametric basis functions on two standard benchmarks in terms of policy quality.

Byron Boots | Krzysztof Choromanski | Vikas Sindhwani | Xinyan Yan

[1] Mikhail Belkin,et al. Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[2] Masashi Sugiyama,et al. Geodesic Gaussian kernels for value function approximation , 2008, Auton. Robots.

[3] Gavin Taylor,et al. Kernelized value function approximation for reinforcement learning , 2009, ICML '09.

[4] Sridhar Mahadevan,et al. Representation Policy Iteration , 2005, UAI.

[5] Sanjiv Kumar,et al. Orthogonal Random Features , 2016, NIPS.

[6] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.

[7] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[8] Le Song,et al. Learning from Conditional Distributions via Dual Kernel Embeddings , 2016, ArXiv.

[9] Marc G. Genton,et al. Classes of Kernels for Machine Learning: A Statistics Perspective , 2002, J. Mach. Learn. Res..

[10] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.

[11] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.

[12] Le Song,et al. Learning from Conditional Distributions via Dual Embeddings , 2016, AISTATS.

[13] Andrew Y. Ng,et al. Regularization and feature selection in least-squares temporal difference learning , 2009, ICML '09.

[14] Mikhail Belkin,et al. Semi-supervised Learning by Higher Order Regularization , 2011, AISTATS.

[15] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[16] Shie Mannor,et al. Regularized Policy Iteration , 2008, NIPS.

[17] Scott Kuindersma,et al. Optimization and stabilization of trajectories for constrained dynamical systems , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[18] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[19] Xin Xu,et al. Kernel-Based Least Squares Policy Iteration for Reinforcement Learning , 2007, IEEE Transactions on Neural Networks.

[20] Mikhail Belkin,et al. Beyond the point cloud: from transductive to semi-supervised learning , 2005, ICML.

[21] Mikhail Belkin,et al. Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[22] Lihong Li,et al. An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning , 2008, ICML '08.

[23] Bernhard Schölkopf,et al. A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[24] P. Schweitzer,et al. Generalized polynomial approximations in Markovian decision processes , 1985 .

[25] Matthew W. Hoffman,et al. Regularized Least Squares Temporal Difference Learning with Nested ℓ2 and ℓ1 Penalization , 2011, EWRL.

[26] Derong Liu,et al. Manifold Regularized Reinforcement Learning , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[27] Marek Petrik,et al. An Analysis of Laplacian Methods for Value Function Approximation in MDPs , 2007, IJCAI.

[28] Alexander J. Smola,et al. Kernels and Regularization on Graphs , 2003, COLT.

[29] Krzysztof Choromanski,et al. The Unreasonable Effectiveness of Random Orthogonal Embeddings , 2017 .