暂无分享,去创建一个
[1] Joshua B. Tenenbaum,et al. Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.
[2] Richard Y. Chen,et al. UCB EXPLORATION VIA Q-ENSEMBLES , 2018 .
[3] Longxin Lin. Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching , 2004, Machine Learning.
[4] Peter Dayan,et al. Hippocampal Contributions to Control: The Third Way , 2007, NIPS.
[5] Ian Osband,et al. Risk versus Uncertainty in Deep Learning: Bayes, Bootstrap and the Dangers of Dropout , 2016 .
[6] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[7] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[8] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[9] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[10] Marc Peter Deisenroth,et al. Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.
[11] Benjamin Van Roy,et al. (More) Efficient Reinforcement Learning via Posterior Sampling , 2013, NIPS.
[12] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[13] Roy Fox,et al. Taming the Noise in Reinforcement Learning via Soft Updates , 2015, UAI.
[14] Demis Hassabis,et al. Neural Episodic Control , 2017, ICML.
[15] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[16] Joel Z. Leibo,et al. Model-Free Episodic Control , 2016, ArXiv.
[17] Marlos C. Machado,et al. A Laplacian Framework for Option Discovery in Reinforcement Learning , 2017, ICML.
[18] Dale Schuurmans,et al. Bridging the Gap Between Value and Policy Based Reinforcement Learning , 2017, NIPS.
[19] Brendan Maginnis,et al. A short variational proof of equivalence between policy gradients and soft Q learning , 2017, ArXiv.
[20] Mieke Verfaellie,et al. Interdependence of episodic and semantic memory: Evidence from neuropsychology , 2010, Journal of the International Neuropsychological Society.
[21] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.
[22] Kavosh Asadi,et al. DeepMellow: Removing the Need for a Target Network in Deep Q-Learning , 2019, IJCAI.
[23] Csaba Szepesvári,et al. A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.
[24] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[25] H. Robbins,et al. Asymptotically efficient adaptive allocation rules , 1985 .
[26] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[27] P. Dayan,et al. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.
[28] Kavosh Asadi,et al. An Alternative Softmax Operator for Reinforcement Learning , 2016, ICML.
[29] D. Anderson,et al. Algorithms for minimization without derivatives , 1974 .
[30] Vicenç Gómez,et al. A unified view of entropy-regularized Markov decision processes , 2017, ArXiv.