An Empirical Evaluation of True Online TD({\lambda})
暂无分享,去创建一个
Patrick M. Pilarski | Richard S. Sutton | Harm van Seijen | A. Rupam Mahmood | R. Sutton | P. Pilarski | A. Mahmood | H. V. Seijen
[1] Richard S. Sutton,et al. True online TD(λ) , 2014, ICML 2014.
[2] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[3] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.
[4] B Hudgins,et al. Myoelectric signal processing for control of powered limb prostheses. , 2006, Journal of electromyography and kinesiology : official journal of the International Society of Electrophysiological Kinesiology.
[5] Richard S. Sutton,et al. Multi-timescale nexting in a reinforcement learning robot , 2011, Adapt. Behav..
[6] Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21-26 June 2014 , 2014, ICML.
[7] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[8] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[9] Peter Dayan,et al. The convergence of TD(λ) for general λ , 1992, Machine Learning.
[10] Patrick M. Pilarski,et al. Adaptive artificial limbs: a real-time approach to prediction and anticipation , 2013, IEEE Robotics & Automation Magazine.
[11] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.