论文信息 - Accelerated gradient temporal difference learning algorithms

Accelerated gradient temporal difference learning algorithms

In this paper we study Temporal Difference (TD) Learning with linear value function approximation. The classic TD algorithm is known to be unstable with linear function approximation and off-policy learning. Recently developed Gradient TD (GTD) algorithms have addressed this problem successfully. Despite their prominent properties of good scalability and convergence to correct solutions, they inherit the potential weakness of slow convergence as they are a stochastic gradient descent algorithm. Accelerated stochastic gradient descent algorithms have been developed to speed up convergence, while still keeping computational complexity low. In this work, we develop an accelerated stochastic gradient descent method for minimizing the Mean Squared Projected Bellman Error (MSPBE), and derive a bound for the Lipschitz constant of the gradient of the MSPBE, which plays a critical role in our proposed accelerated GTD algorithms. Our comprehensive numerical experiments demonstrate promising performance in solving the policy evaluation problem, in comparison to the GTD]algorithm family. In particular, accelerated TDC surpasses state-of-the-art algorithms.

[1] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.

[2] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[3] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.

[4] Y. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[5] James T. Kwok,et al. Accelerated Gradient Methods for Stochastic Optimization and Online Learning , 2009, NIPS.

[6] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[7] R. Sutton,et al. A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation , 2008, NIPS 2008.

[8] Lihong Li,et al. A worst-case comparison between temporal difference and residual gradient with linear function approximation , 2008, ICML '08.

[9] A. Barto,et al. Improved Temporal Difference Methods with Linear Function Approximation , 2004 .

[10] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.

[11] Jan Peters,et al. Policy evaluation with temporal differences: a survey and comparison , 2015, J. Mach. Learn. Res..

[12] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.