暂无分享,去创建一个
Zeb Kurth-Nelson | Joel Z. Leibo | Charles Blundell | Dhruva Tirumala | Dharshan Kumaran | Hubert Soyer | Jane X. Wang | Rémi Munos | Matt M. Botvinick | Jane X. Wang | R. Munos | D. Kumaran | Dhruva Tirumala | C. Blundell | M. Botvinick | Z. Kurth-Nelson | Hubert Soyer
[1] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[2] H. Harlow,et al. The formation of learning sets. , 1949, Psychological review.
[3] R. Rescorla,et al. A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .
[4] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .
[5] Peter R. Conwell,et al. Fixed-weight networks can learn , 1990, 1990 IJCNN International Joint Conference on Neural Networks.
[6] Jieyu Zhao,et al. Simple Principles of Metalearning , 1996 .
[7] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[8] Sebastian Thrun,et al. Learning to Learn: Introduction and Overview , 1998, Learning to Learn.
[9] A. Steven Younger,et al. Fixed-weight on-line learning , 1999, IEEE Trans. Neural Networks.
[10] Sepp Hochreiter,et al. Learning to Learn Using Gradient Descent , 2001, ICANN.
[11] Danil V. Prokhorov,et al. Adaptive behavior with fixed weights in RNN: an overview , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).
[12] Kenji Doya,et al. Meta-learning in Reinforcement Learning , 2003, Neural Networks.
[13] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[14] P. Dayan,et al. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.
[15] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[16] Xiao-Jing Wang,et al. Neural mechanism for stochastic behaviour during a competitive game , 2006, Neural Networks.
[17] P.J. Werbos,et al. Efficient Learning in Cellular Simultaneous Recurrent Neural Networks - The Case of Maze Navigation Problem , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[18] Timothy E. J. Behrens,et al. Learning the value of information in an uncertain world , 2007, Nature Neuroscience.
[19] Kunikazu Kobayashi,et al. A Meta-learning Method Based on Temporal Difference Error , 2009, ICONIP.
[20] Ethan S. Bromberg-Martin,et al. Midbrain Dopamine Neurons Signal Preference for Advance Information about Upcoming Rewards , 2009, Neuron.
[21] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[22] Daeyeol Lee,et al. Mechanisms for Stochastic Decision Making in the Primate Frontal Cortex , 2009 .
[23] P. Dayan,et al. Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.
[24] Peter Ford Dominey,et al. Robot Cognitive Control with a Neurophysiologically Inspired Reinforcement Learning Model , 2011, Front. Neurorobot..
[25] Aurélien Garivier,et al. On Bayesian Upper Confidence Bounds for Bandit Problems , 2012, AISTATS.
[26] Peter Ford Dominey,et al. Medial prefrontal cortex and the adaptive regulation of reinforcement learning parameters. , 2013, Progress in brain research.
[27] Tor Lattimore,et al. Bounded Regret for Finite-Armed Structured Bandits , 2014, NIPS.
[28] Benjamin Van Roy,et al. Learning to Optimize via Information-Directed Sampling , 2014, NIPS.
[29] Zeb Kurth-Nelson,et al. Model-Based Reasoning in Humans Becomes Automatic with Training , 2015, PLoS Comput. Biol..
[30] Peter Dayan,et al. Simple Plans or Sophisticated Habits? State, Transition and Learning Interactions in the Two-step Task , 2015, bioRxiv.
[31] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.
[32] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[33] Marcin Andrychowicz,et al. Learning to learn by gradient descent by gradient descent , 2016, NIPS.
[34] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[35] Misha Denil,et al. Learning to Learn for Global Optimization of Black Box Functions , 2016, ArXiv.
[36] Wouter Kool,et al. When Does Model-Based Control Pay Off? , 2016, PLoS Comput. Biol..
[37] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[38] Joshua B. Tenenbaum,et al. Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.
[39] Sergio Gomez Colmenarejo,et al. Hybrid computing using a neural network with dynamic external memory , 2016, Nature.
[40] Pieter Abbeel,et al. Value Iteration Networks , 2016, NIPS.
[41] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[42] Peter L. Bartlett,et al. RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.
[43] Bartunov Sergey,et al. Meta-Learning with Memory-Augmented Neural Networks , 2016 .
[44] Tom Schaul,et al. The Predictron: End-To-End Learning and Planning , 2016, ICML.
[45] Quoc V. Le,et al. Neural Architecture Search with Reinforcement Learning , 2016, ICLR.
[46] Misha Denil,et al. Learning to Learn without Gradient Descent by Gradient Descent , 2016, ICML.
[47] Razvan Pascanu,et al. Learning to Navigate in Complex Environments , 2016, ICLR.
[48] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.
[49] Jitendra Malik,et al. Learning to Optimize , 2016, ICLR.