论文信息 - Human-level control through deep reinforcement learning - 字舞流文

Human-level control through deep reinforcement learning

Marc G. Bellemare | Martin A. Riedmiller | Andrei A. Rusu | K. Kavukcuoglu | D. Hassabis | Georg Ostrovski | J. Veness | S. Legg | Volodymyr Mnih | A. Fidjeland | Stig Petersen | Charlie Beattie | Ioannis Antonoglou | Helen King | D. Kumaran | Daan Wierstra | David Silver | Amir Sadik | Alex Graves | D. Wierstra

[1] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[2] M. Wilson,et al. Biasing the content of hippocampal replay during sleep , 2012, Nature Neuroscience.

[3] Marc G. Bellemare,et al. Investigating Contingency Awareness Using Atari 2600 Games , 2012, AAAI.

[4] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[5] Martin A. Riedmiller,et al. Deep auto-encoder neural networks in reinforcement learning , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[6] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[7] J. O’Neill,et al. Play it again: reactivation of waking experience and memory , 2010, Trends in Neurosciences.

[8] Yann LeCun,et al. What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[9] Martin A. Riedmiller,et al. Reinforcement learning for robot soccer , 2009, Auton. Robots.

[10] C. Law,et al. Reinforcement learning can account for associative and perceptual learning on a visual decision task , 2009, Nature Neuroscience.

[11] Andre Cohen,et al. An object-oriented representation for efficient reinforcement learning , 2008, ICML '08.

[12] Shane Legg,et al. Universal Intelligence: A Definition of Machine Intelligence , 2007, Minds and Machines.

[13] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[14] Thomas Serre,et al. Object recognition with features inspired by visual cortex , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[15] Michael R. Genesereth,et al. General Game Playing: Overview of the AAAI Competition , 2005, AI Mag..

[16] N. Sigala,et al. Visual categorization shapes feature selectivity in the primate temporal cortex , 2002, Nature.

[17] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[18] Peter Dayan,et al. A Neural Substrate of Prediction and Reward , 1997, Science.

[19] James L. McClelland,et al. Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. , 1995, Psychological review.

[20] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.

[21] Andrew W. Moore,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[22] James L. McClelland,et al. James L. McClelland, David Rumelhart and the PDP Research Group, Parallel distributed processing: explorations in the microstructure of cognition . Vol. 1. Foundations . Vol. 2. Psychological and biological models . Cambridge MA: M.I.T. Press, 1987. , 1989, Journal of Child Language.

[23] James L. McClelland,et al. Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[24] Kunihiko Fukushima,et al. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[25] D. Hubel,et al. Shape and arrangement of columns in cat's striate cortex , 1963, The Journal of physiology.

[26] Geoffrey E. Hinton,et al. Melting of Peridotite to 140 Gigapascals , 2010, Science.

[27] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .

[28] Yoshua Bengio. Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[29] P. Read Montague,et al. Reinforcement Learning: An Introduction , 2005, IEEE Transactions on Neural Networks.

[30] P. Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.

[31] Andrew W. Moore,et al. Prioritized sweeping: Reinforcement learning with less data and less time , 2004, Machine Learning.

[32] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[33] Benjamin Van Roy,et al. An Analysis of Temporal-Difference Learning with Function Approximation , 1998 .

[34] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .

[35] W. Brown. Animal Intelligence: Experimental Studies , 1912, Nature.