Natural Value Approximators: Learning when to Trust Past Estimates
暂无分享,去创建一个
Tom Schaul | David Silver | Zhongwen Xu | Joseph Modayil | Hado van Hasselt | André Barreto | T. Schaul | Joseph Modayil | H. V. Hasselt | André Barreto | Zhongwen Xu | David Silver
[1] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[2] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[3] P J Webros. BACKPROPAGATION THROUGH TIME: WHAT IT DOES AND HOW TO DO IT , 1990 .
[4] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[5] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.
[6] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[7] D. Barrios-Aranibar,et al. LEARNING FROM DELAYED REWARDS USING INFLUENCE VALUES APPLIED TO COORDINATION IN MULTI-AGENT SYSTEMS , 2007 .
[8] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[9] Adam Krzyzak,et al. A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.
[10] Longxin Lin. Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching , 2004, Machine Learning.
[11] Richard S. Sutton,et al. Learning to Predict Independent of Span , 2015, ArXiv.
[12] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[13] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[14] R. Bellman. A Markovian Decision Process , 1957 .
[15] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[16] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[17] Richard S. Sutton,et al. Predictive Representations of State , 2001, NIPS.