暂无分享,去创建一个
Marc G. Bellemare | Rémi Munos | Mohammad Gheshlaghi Azar | Audrunas Gruslys | R. Munos | M. G. Azar | A. Gruslys
[1] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[2] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[3] Lihong Li,et al. Toward Minimax Off-policy Value Estimation , 2015, AISTATS.
[4] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[5] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[6] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[7] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[8] Honglak Lee,et al. Understanding and Improving Convolutional Neural Networks via Concatenated Rectified Linear Units , 2016, ICML.
[9] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[10] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[11] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[12] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[13] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[14] Nando de Freitas,et al. Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.
[15] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.
[16] Doina Precup,et al. Investigating Recurrence and Eligibility Traces in Deep Q-Networks , 2017, ArXiv.