暂无分享,去创建一个
Martha White | Richard S. Sutton | Huizhen Yu | Ashique Rupam Mahmood | R. Sutton | Huizhen Yu | Martha White | A. Mahmood
[1] J. Gillis,et al. Matrix Iterative Analysis , 1961 .
[2] Richard S. Sutton,et al. TD Models: Modeling the World at a Mixture of Time Scales , 1995, ICML.
[3] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[4] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[5] Andrew G. Barto,et al. Linear Least-Squares Algorithms for Temporal Difference Learning , 2005, Machine Learning.
[6] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.
[7] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[8] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .
[9] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.
[10] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[11] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[12] Richard S. Sutton,et al. GQ(lambda): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010, Artificial General Intelligence.
[13] R. Sutton,et al. GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010 .
[14] R. Sutton,et al. Gradient temporal-difference learning algorithms , 2011 .
[15] Huizhen Yu,et al. Least Squares Temporal Difference Methods: An Analysis under General Conditions , 2012, SIAM J. Control. Optim..
[16] Richard S. Sutton,et al. Weighted importance sampling for off-policy learning with linear function approximation , 2014, NIPS.
[17] Philip Thomas,et al. Bias in Natural Actor-Critic Algorithms , 2014, ICML.
[18] Richard S. Sutton,et al. True online TD(λ) , 2014, ICML 2014.
[19] Richard S. Sutton,et al. Off-policy TD( l) with a true online equivalence , 2014, UAI.
[20] Huizhen Yu,et al. On Convergence of Emphatic Temporal-Difference Learning , 2015, COLT.
[21] Richard S. Sutton,et al. Off-policy learning based on weighted importance sampling with linear computational complexity , 2015, UAI.
[22] Martha White,et al. An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning , 2015, J. Mach. Learn. Res..