Unifying n-Step Temporal-Difference Action-Value Methods
暂无分享,去创建一个
[1] Kristopher De Asis. A Unified View of Multi-step Temporal Difference Learning , 2018 .
[2] Richard S. Sutton,et al. Per-decision Multi-step Temporal Difference Learning with Control Variates , 2018, UAI.
[3] David Budden,et al. Distributed Prioritized Experience Replay , 2018, ICLR.
[4] Gang Pan,et al. A Unified Approach for Multi-step Temporal-Difference Learning with Eligibility Traces in Reinforcement Learning , 2018, IJCAI.
[5] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[6] Doina Precup,et al. Learning with Options that Terminate Off-Policy , 2017, AAAI.
[7] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[8] Pascal Vincent,et al. Convergent Tree-Backup and Retrace with Function Approximation , 2017, ICML.
[9] Marc G. Bellemare,et al. The Reactor: A Sample-Efficient Actor-Critic Architecture , 2017, ArXiv.
[10] Richard S. Sutton,et al. Multi-step Off-policy Learning Without Importance Sampling Ratios , 2017, ArXiv.
[11] Markus Dumke. Double Q(σ) and Q(σ, λ): Unifying Reinforcement Learning Control Algorithms , 2017, ArXiv.
[12] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[13] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[14] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[15] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[16] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[17] Shimon Whiteson,et al. A theoretical and empirical analysis of Expected Sarsa , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.
[18] Richard S. Sutton,et al. A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation , 2008, NIPS.
[19] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[20] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[21] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.
[22] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[23] Michael Kearns,et al. Bias-Variance Error Bounds for Temporal Difference Updates , 2000, COLT.
[24] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[25] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[26] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .
[27] Gavin Adrian Rummery. Problem solving with reinforcement learning , 1995 .
[28] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[29] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[30] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .
[31] C. Watkins. Learning from delayed rewards , 1989 .
[32] James S. Albus,et al. Brains, behavior, and robotics , 1981 .
[33] J. Albus. A Theory of Cerebellar Function , 1971 .