暂无分享,去创建一个
Tor Lattimore | Richard S. Sutton | Csaba Szepesvári | Satinder Singh | David Silver | Matteo Hessel | Benjamin Van Roy | Ian Osband | John Aslanides | Hado van Hasselt | Yotam Doron | Eren Sezener | Andre Saraiva | Katrina McKinney | R. Sutton | Katrina McKinney | D. Silver | Csaba Szepesvari | Satinder Singh | Tor Lattimore | Ian Osband | Matteo Hessel | H. V. Hasselt | Yotam Doron | J. Aslanides | Eren Sezener | Andre Saraiva | David Silver | John Aslanides
[1] K. Lewin. Psychology and the Process of Group Living , 1943 .
[2] J. Kiefer,et al. Stochastic Estimation of the Maximum of a Regression Function , 1952 .
[3] F ROSENBLATT,et al. The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.
[4] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..
[5] J. O'Keefe,et al. The hippocampus as a spatial map. Preliminary evidence from unit activity in the freely-moving rat. , 1971, Brain research.
[6] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .
[7] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[8] Andrew W. Moore,et al. Efficient memory-based learning for robot control , 1990 .
[9] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..
[10] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[11] E. Kandel,et al. Cognitive Neuroscience and the Study of Memory , 1998, Neuron.
[12] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[13] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.
[14] Kunihiko Fukushima,et al. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.
[15] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[16] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[17] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[18] Shane Legg,et al. A Collection of Definitions of Intelligence , 2007, AGI.
[19] Brian E. Granger,et al. IPython: A System for Interactive Scientific Computing , 2007, Computing in Science & Engineering.
[20] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[21] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..
[22] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[23] Brian Tanner,et al. RL-Glue: Language-Independent Software for Reinforcement-Learning Experiments , 2009, J. Mach. Learn. Res..
[24] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[25] Léon Bottou,et al. Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.
[26] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[27] Benjamin Van Roy,et al. (More) Efficient Reinforcement Learning via Posterior Sampling , 2013, NIPS.
[28] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[29] Shane Legg,et al. Massively Parallel Methods for Deep Reinforcement Learning , 2015, ArXiv.
[30] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[31] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[32] Kenji Kawaguchi,et al. Deep Learning without Poor Local Minima , 2016, NIPS.
[33] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[34] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[35] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[36] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[37] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[38] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[39] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[40] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[41] Benjamin Van Roy,et al. Ensemble Sampling , 2017, NIPS.
[42] Matus Telgarsky,et al. Spectrally-normalized margin bounds for neural networks , 2017, NIPS.
[43] Benjamin Van Roy,et al. A Tutorial on Thompson Sampling , 2017, Found. Trends Mach. Learn..
[44] David Budden,et al. Distributed Prioritized Experience Replay , 2018, ICLR.
[45] Marlos C. Machado,et al. Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents (Extended Abstract) , 2018, IJCAI.
[46] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[47] Marc G. Bellemare,et al. Dopamine: A Research Framework for Deep Reinforcement Learning , 2018, ArXiv.
[48] Yuval Tassa,et al. DeepMind Control Suite , 2018, ArXiv.
[49] Mikhail Belkin,et al. Reconciling modern machine learning and the bias-variance trade-off , 2018, ArXiv.
[50] Mikhail Belkin,et al. Reconciling modern machine-learning practice and the classical bias–variance trade-off , 2018, Proceedings of the National Academy of Sciences.
[51] Albin Cassirer,et al. Randomized Prior Functions for Deep Reinforcement Learning , 2018, NeurIPS.
[52] Demis Hassabis,et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.
[53] Pieter Abbeel,et al. Benchmarking Model-Based Reinforcement Learning , 2019, ArXiv.
[54] Zheng Wen,et al. Deep Exploration via Randomized Value Functions , 2017, J. Mach. Learn. Res..