论文信息 - Accelerated Gradient Temporal Difference Learning

Accelerated Gradient Temporal Difference Learning

The family of temporal difference (TD) methods span a spectrum from computationally frugal linear methods like TD({\lambda}) to data efficient least squares methods. Least square methods make the best use of available data directly computing the TD solution and thus do not require tuning a typically highly sensitive learning rate parameter, but require quadratic computation and storage. Recent algorithmic developments have yielded several sub-quadratic methods that use an approximation to the least squares TD solution, but incur bias. In this paper, we propose a new family of accelerated gradient TD (ATD) methods that (1) provide similar data efficiency benefits to least-squares methods, at a fraction of the computation and storage (2) significantly reduce parameter sensitivity compared to linear TD methods, and (3) are asymptotically unbiased. We illustrate these claims with a proof of convergence in expectation and experiments on several benchmark domains and a large-scale industrial energy allocation domain.

[1] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.

[2] P. Hansen. The discrete picard condition for discrete ill-posed problems , 1990 .

[3] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[4] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.

[5] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[6] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[7] Alborz Geramifard,et al. iLSTD: Eligibility Traces and Convergence Analysis , 2006, NIPS.

[8] Alborz Geramifard,et al. Incremental Least-Squares Temporal Difference Learning , 2006, AAAI.

[9] Simon Günter,et al. A Stochastic Quasi-Newton Method for Online Convex Optimization , 2007, AISTATS.

[10] Patrick Gallinari,et al. SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent , 2009, J. Mach. Learn. Res..

[11] Alessandro Lazaric,et al. LSTD with Random Projections , 2010, NIPS.

[12] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[13] Wen Zhang,et al. Convergence of General Nonstationary Iterative Methods for Solving Singular Linear Equations , 2011, SIAM J. Matrix Anal. Appl..

[14] R. Sutton,et al. Gradient temporal-difference learning algorithms , 2011 .

[15] D. Bertsekas,et al. On the convergence of simulation-based iterative methods for solving singular linear systems , 2013 .

[16] Daniel F. Salas,et al. Benchmarking a Scalable Approximate Dynamic Programming Algorithm for Stochastic Control of Multidimensional Energy Storage Problems , 2013 .

[17] Jan Peters,et al. Policy evaluation with temporal differences: a survey and comparison , 2015, J. Mach. Learn. Res..

[18] Rémi Munos,et al. Fast LSTD Using Stochastic Approximation: Finite Time Analysis and Application to Traffic Control , 2013, ECML/PKDD.

[19] Arash Givchi,et al. Quasi Newton Temporal Difference Learning , 2014, ACML.

[20] Aryan Mokhtari,et al. RES: Regularized Stochastic BFGS Algorithm , 2014, IEEE Transactions on Signal Processing.

[21] Philip S. Thomas,et al. Natural Temporal Difference Learning , 2014, AAAI.

[22] Richard S. Sutton,et al. Off-policy TD( l) with a true online equivalence , 2014, UAI.

[23] Richard S. Sutton,et al. True Online TD(lambda) , 2014, ICML.

[24] Bo Liu,et al. Proximal Reinforcement Learning: A New Theory of Sequential Decision Making in Primal-Dual Spaces , 2014, ArXiv.

[25] Hao Shen,et al. Accelerated gradient temporal difference learning algorithms , 2014, 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[26] Huizhen Yu,et al. On Convergence of Emphatic Temporal-Difference Learning , 2015, COLT.

[27] Martha White,et al. Incremental Truncated LSTD , 2015, IJCAI.

[28] Patrick M. Pilarski,et al. True Online Temporal-Difference Learning , 2015, J. Mach. Learn. Res..

[29] Martha White,et al. An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning , 2015, J. Mach. Learn. Res..

[30] Martha White,et al. Investigating Practical Linear Temporal Difference Learning , 2016, AAMAS.

[31] Martha White,et al. Unifying Task Specification in Reinforcement Learning , 2016, ICML.