论文信息 - Reinforcement Learning, Fast and Slow - 字舞流文

Reinforcement Learning, Fast and Slow

Jane X. Wang | D. Hassabis | C. Blundell | M. Botvinick | Z. Kurth-Nelson | S. Ritter | Samuel Ritter

[1] Joel Z. Leibo,et al. Evolving intrinsic motivations for altruistic behavior , 2018, AAMAS.

[2] Demis Hassabis,et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[3] Zeb Kurth-Nelson,et al. What Is a Cognitive Map? Organizing Knowledge for Flexible Behavior , 2018, Neuron.

[4] M Botvinick,et al. Episodic Control as Meta-Reinforcement Learning , 2018, bioRxiv.

[5] Koray Kavukcuoglu,et al. Neural scene representation and rendering , 2018, Science.

[6] Tom Schaul,et al. Meta-learning by the Baldwin effect , 2018, GECCO.

[7] Razvan Pascanu,et al. Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[8] Zeb Kurth-Nelson,et al. Been There, Done That: Meta-Learning with Episodic Recall , 2018, ICML.

[9] Joel Z. Leibo,et al. Prefrontal cortex as a meta-reinforcement learning system , 2018, bioRxiv.

[10] Samuel M. McClure,et al. Hippocampal pattern separation supports reinforcement learning , 2018, Nature Communications.

[11] Joel Z. Leibo,et al. Unsupervised Predictive Memory in a Goal-Directed Agent , 2018, ArXiv.

[12] Thomas L. Griffiths,et al. Recasting Gradient-Based Meta-Learning as Hierarchical Bayes , 2018, ICLR.

[13] Gary Marcus,et al. Deep Learning: A Critical Appraisal , 2018, ArXiv.

[14] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[15] Joel Z. Leibo,et al. Human-level performance in first-person multiplayer games with population-based deep reinforcement learning , 2018, ArXiv.

[16] Demis Hassabis,et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[17] Max Jaderberg,et al. Population Based Training of Neural Networks , 2017, ArXiv.

[18] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.

[19] N. Uchida,et al. Neural Circuitry of Reward Prediction Error. , 2017, Annual review of neuroscience.

[20] Christopher Burgess,et al. DARLA: Improving Zero-Shot Transfer in Reinforcement Learning , 2017, ICML.

[21] K. Norman,et al. Reinstated episodic context guides sampling-based decisions for reward , 2017, Nature Neuroscience.

[22] Kevin J. Miller,et al. Dorsal hippocampus contributes to model-based planning , 2017, Nature Neuroscience.

[23] Joshua L. Jones,et al. Dopamine transients are sufficient and necessary for acquisition of model-based associations , 2017, Nature Neuroscience.

[24] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[25] Demis Hassabis,et al. Neural Episodic Control , 2017, ICML.

[26] Mel W. Khaw,et al. Reminders of past choices bias decisions for reward in humans , 2017, Nature Communications.

[27] Kevin Waugh,et al. DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.

[28] N. Daw,et al. Reinforcement Learning and Episodic Memory in Humans and Animals: An Integrative Framework , 2017, Annual review of psychology.

[29] Zeb Kurth-Nelson,et al. Learning to reinforcement learn , 2016, CogSci.

[30] Christopher Burgess,et al. beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[31] Alex Graves,et al. Decoupled Neural Interfaces using Synthetic Gradients , 2016, ICML.

[32] Joshua B. Tenenbaum,et al. Human Learning in Atari , 2017, AAAI Spring Symposia.

[33] Peter L. Bartlett,et al. RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[34] Sergio Gomez Colmenarejo,et al. Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[35] Timothy Edward John Behrens,et al. Value, search, persistence and model updating in anterior cingulate cortex , 2016, Nature Neuroscience.

[36] Fabian Grabenhorst,et al. A dynamic code for economic object valuation in prefrontal cortex neurons , 2016, Nature Communications.

[37] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[38] Xiao-Jing Wang,et al. Reward-based training of recurrent neural networks for cognitive and value-based tasks , 2016, bioRxiv.

[39] James L. McClelland,et al. What Learning Systems do Intelligent Agents Need? Complementary Learning Systems Theory Updated , 2016, Trends in Cognitive Sciences.

[40] Daan Wierstra,et al. Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.

[41] Marcin Andrychowicz,et al. Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[42] Joel Z. Leibo,et al. Model-Free Episodic Control , 2016, ArXiv.

[43] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.

[44] Konrad P. Körding,et al. Toward an Integration of Deep Learning and Neuroscience , 2016, bioRxiv.

[45] Joshua B. Tenenbaum,et al. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[46] Joshua B. Tenenbaum,et al. Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[47] Geoffrey Schoenbaum,et al. Midbrain dopamine neurons compute inferred and cached value prediction errors in a common framework , 2016, eLife.

[48] J. DiCarlo,et al. Using goal-driven deep learning models to understand sensory cortex , 2016, Nature Neuroscience.

[49] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[50] Yoram Singer,et al. Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.

[51] Joshua B. Tenenbaum,et al. Human-level concept learning through probabilistic program induction , 2015, Science.

[52] Alec Solway,et al. Reinforcement learning, efficient coding, and the statistics of natural tasks , 2015, Current Opinion in Behavioral Sciences.

[53] Matthew T. Kaufman,et al. A neural network that finds a naturalistic solution for the production of muscle activity , 2015, Nature Neuroscience.

[54] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.

[55] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[56] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[57] Nikolaus Kriegeskorte,et al. Deep Supervised, but Not Unsupervised, Models May Explain IT Cortical Representation , 2014, PLoS Comput. Biol..

[58] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[59] P. Dayan,et al. Goals and Habits in the Brain , 2013, Neuron.

[60] Peter Ford Dominey,et al. Medial prefrontal cortex and the adaptive regulation of reinforcement learning parameters. , 2013, Progress in brain research.

[61] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[62] O. Sporns,et al. The economy of brain network organization , 2012, Nature Reviews Neuroscience.

[63] Timothy E. J. Behrens,et al. Review Frontal Cortex and Reward-guided Learning and Decision-making Figure 1. Frontal Brain Regions in the Macaque Involved in Reward-guided Learning and Decision-making Finer Grained Anatomical Divisions with Frontal Cortical Systems for Reward-guided Behavior , 2022 .

[64] P. Dayan,et al. Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.

[65] Charles Kemp,et al. How to Grow a Mind: Statistics, Structure, and Abstraction , 2011, Science.

[66] P. Glimcher. Understanding dopamine and reinforcement learning: The dopamine reward prediction error hypothesis , 2011, Proceedings of the National Academy of Sciences.

[67] J. Tenenbaum,et al. Probabilistic models of cognition: exploring representations and inductive biases , 2010, Trends in Cognitive Sciences.

[68] D. Modha,et al. Network architecture of the long-distance pathways in the macaque brain , 2010, Proceedings of the National Academy of Sciences.

[69] P. Dayan,et al. States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning , 2010, Neuron.

[70] Charles Kemp,et al. Bayesian models of cognition , 2008 .

[71] Peter Dayan,et al. Hippocampal Contributions to Control: The Third Way , 2007, NIPS.

[72] Matthew M Botvinick,et al. Multilevel structure in behaviour and in the brain: a model of Fuster's hierarchy , 2007, Philosophical Transactions of the Royal Society B: Biological Sciences.

[73] Dorothy Tse,et al. References and Notes Supporting Online Material Materials and Methods Figs. S1 to S5 Tables S1 to S3 Electron Impact (ei) Mass Spectra Chemical Ionization (ci) Mass Spectra References Schemas and Memory Consolidation Research Articles Research Articles Research Articles Research Articles , 2022 .

[74] Katherine D. Kinzler,et al. Core knowledge. , 2007, Developmental science.

[75] Christopher M. Bishop,et al. Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[76] Michael J. Frank,et al. Making Working Memory Work: A Computational Model of Learning in the Prefrontal Cortex and Basal Ganglia , 2006, Neural Computation.

[77] P. Dayan,et al. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[78] Jonathan D. Cohen,et al. Prefrontal cortex and flexible cognitive control: rules without symbols. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[79] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[80] O. Sporns,et al. Organization, development and function of complex brain networks , 2004, Trends in Cognitive Sciences.

[81] Karin Ackermann,et al. Categories and Concepts , 2003, Job 28. Cognition in Context.

[82] Kenji Doya,et al. Meta-learning in Reinforcement Learning , 2003, Neural Networks.

[83] D. Plaut. Graded modality-specific specialisation in semantics: A computational account of optic aphasia , 2002, Cognitive neuropsychology.

[84] Risto Miikkulainen,et al. Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[85] Junichiro Yoshimoto,et al. Control of exploitation-exploration meta-parameter in reinforcement learning , 2002, Neural Networks.

[86] Sepp Hochreiter,et al. Learning to Learn Using Gradient Descent , 2001, ICANN.

[87] E S Spelke,et al. Core knowledge. , 2000, The American psychologist.

[88] Jonathan Baxter,et al. Theoretical Models of Learning to Learn , 1998, Learning to Learn.

[89] Sebastian Thrun,et al. Learning to Learn: Introduction and Overview , 1998, Learning to Learn.

[90] B. Balleine,et al. Goal-directed instrumental action: contingency and incentive learning and their cortical substrates , 1998, Neuropharmacology.

[91] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[92] Peter Dayan,et al. A Neural Substrate of Prediction and Reward , 1997, Science.

[93] Jieyu Zhao,et al. Simple Principles of Metalearning , 1996 .

[94] P. Dayan,et al. A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[95] James L. McClelland,et al. Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. , 1995, Psychological review.

[96] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.

[97] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .

[98] Yoshua Bengio,et al. Learning a synaptic learning rule , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[99] James L. McClelland,et al. Explorations in parallel distributed processing: a handbook of models, programs, and exercises , 1988 .

[100] Geoffrey E. Hinton,et al. How Learning Can Guide Evolution , 1996, Complex Syst..

[101] G. Logan. Toward an instance theory of automatization. , 1988 .

[102] A G Barto,et al. Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.

[103] H. Harlow,et al. The formation of learning sets. , 1949, Psychological review.