暂无分享,去创建一个
[1] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[2] Richard S. Sutton,et al. True Online TD(lambda) , 2014, ICML.
[3] Marek Petrik,et al. Finite-Sample Analysis of Proximal Gradient TD Algorithms , 2015, UAI.
[4] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..
[5] Richard S. Sutton,et al. GQ(lambda): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010, Artificial General Intelligence.
[6] Shie Mannor,et al. Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis , 2015, AAAI.
[7] Huizhen Yu,et al. On Convergence of Emphatic Temporal-Difference Learning , 2015, COLT.
[8] Richard S. Sutton,et al. Off-policy TD( l) with a true online equivalence , 2014, UAI.
[9] VelosoManuela,et al. A survey of robot learning from demonstration , 2009 .
[10] Matthieu Geist,et al. Off-policy learning with eligibility traces: a survey , 2013, J. Mach. Learn. Res..
[11] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[12] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.
[13] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[14] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.
[15] Leah M Hackman,et al. Faster Gradient-TD Algorithms , 2013 .
[16] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[17] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[18] Jan Peters,et al. Policy evaluation with temporal differences: a survey and comparison , 2015, J. Mach. Learn. Res..
[19] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[20] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[21] Martha White,et al. An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning , 2015, J. Mach. Learn. Res..
[22] Bo Liu,et al. Sparse Q-learning with Mirror Descent , 2012, UAI.
[23] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[24] R. Sutton,et al. GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010 .
[25] Bo Liu,et al. Proximal Reinforcement Learning: A New Theory of Sequential Decision Making in Primal-Dual Spaces , 2014, ArXiv.
[26] Adam M White,et al. DEVELOPING A PREDICTIVE APPROACH TO KNOWLEDGE , 2015 .
[27] Doina Precup,et al. A new Q(lambda) with interim forward view and Monte Carlo equivalence , 2014, ICML.
[28] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[29] R. Sutton,et al. Gradient temporal-difference learning algorithms , 2011 .
[30] Richard S. Sutton,et al. Off-policy learning based on weighted importance sampling with linear computational complexity , 2015, UAI.