暂无分享,去创建一个
[1] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[2] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[3] Tsuyoshi Murata,et al. {m , 1934, ACML.
[4] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[5] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[6] Reuven Y. Rubinstein,et al. Simulation and the Monte Carlo method , 1981, Wiley series in probability and mathematical statistics.
[7] Martha White,et al. Unifying Task Specification in Reinforcement Learning , 2016, ICML.
[8] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[9] Surya Ganguli,et al. On the saddle point problem for non-convex optimization , 2014, ArXiv.
[10] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[11] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[12] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.
[13] Jeffrey Pennington,et al. Geometry of Neural Network Loss Surfaces via Random Matrix Theory , 2017, ICML.
[14] Martha White,et al. Online Off-policy Prediction , 2018, ArXiv.
[15] P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .
[16] Harm van Seijen,et al. Using a Logarithmic Mapping to Enable Lower Discount Factors in Reinforcement Learning , 2019, NeurIPS.
[17] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.
[18] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.
[19] Shalabh Bhatnagar,et al. Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.
[20] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[21] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.
[22] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[23] Richard S. Sutton,et al. Learning to Predict Independent of Span , 2015, ArXiv.
[24] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[25] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[26] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[27] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[28] John N. Tsitsiklis,et al. Average cost temporal-difference learning , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.
[29] Pierre Priouret,et al. Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.
[30] Francisco S. Melo,et al. Q -Learning with Linear Function Approximation , 2007, COLT.
[32] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[33] Yoshua Bengio,et al. Hyperbolic Discounting and Learning over Multiple Horizons , 2019, ArXiv.