暂无分享,去创建一个
[1] Richard S. Sutton,et al. Off-policy TD( l) with a true online equivalence , 2014, UAI.
[2] Martin A. Riedmiller,et al. A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.
[3] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[4] Huizhen Yu,et al. On Convergence of Emphatic Temporal-Difference Learning , 2015, COLT.
[5] Patrick M. Pilarski,et al. An Empirical Evaluation of True Online TD(λ) , 2015, ArXiv.
[6] Martha White,et al. An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning , 2015, J. Mach. Learn. Res..
[7] R. Sutton,et al. GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010 .
[8] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[9] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.
[10] Philip Thomas,et al. Bias in Natural Actor-Critic Algorithms , 2014, ICML.
[11] Patrick M. Pilarski,et al. Tuning-free step-size adaptation , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12] Richard S. Sutton,et al. True online TD(λ) , 2014, ICML 2014.
[13] Adam M White,et al. DEVELOPING A PREDICTIVE APPROACH TO KNOWLEDGE , 2015 .
[14] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[15] Andrew G. Barto,et al. Adaptive Step-Size for Online Temporal Difference Learning , 2012, AAAI.
[16] R. Sutton,et al. Gradient temporal-difference learning algorithms , 2011 .
[17] Richard S. Sutton,et al. Multi-timescale nexting in a reinforcement learning robot , 2011, Adapt. Behav..