Deep Reinforcement Learning and Its Neuroscientific Implications

[1]  慧慧 周,et al.  Algorithmic Research on Exploring Neural Networks with Activation Atlases , 2022, Software Engineering and Applications.

[2]  Marlos C. Machado,et al.  Exploration in Reinforcement Learning with Deep Covering Options , 2020, ICLR.

[3]  Adam Santoro,et al.  Backpropagation and the brain , 2020, Nature Reviews Neuroscience.

[4]  Richard Naud,et al.  Burst-dependent synaptic plasticity can coordinate learning in hierarchical circuits , 2020, Nature Neuroscience.

[5]  I. Momennejad Learning Structures: Predictive Representations, Replay, and Generalization , 2020, Current Opinion in Behavioral Sciences.

[6]  Daniel Guo,et al.  Never Give Up: Learning Directed Exploration Strategies , 2020, ICLR.

[7]  Demis Hassabis,et al.  MEMO: A Deep Network for Flexible Combination of Episodic Memories , 2020, ICLR.

[8]  A. Nieder,et al.  Dopamine Gates Visual Signals in Monkey Prefrontal Cortex Neurons. , 2020, Cell reports.

[9]  Zeb Kurth-Nelson,et al.  A distributional code for value in dopamine-based reinforcement learning , 2020, Nature.

[10]  Demis Hassabis,et al.  Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.

[11]  Razvan Pascanu,et al.  Stabilizing Transformers for Reinforcement Learning , 2019, ICML.

[12]  Caswell Barry,et al.  The Tolman-Eichenbaum Machine: Unifying Space and Relational Memory through Generalization in the Hippocampal Formation , 2019, Cell.

[13]  Uri Hasson,et al.  Direct Fit to Nature: An Evolutionary Perspective on Biological and Artificial Neural Networks , 2019, Neuron.

[14]  David Warde-Farley,et al.  Fast Task Inference with Variational Intrinsic Successor Features , 2019, ICLR.

[15]  Doina Precup,et al.  The Option Keyboard: Combining Skills in Reinforcement Learning , 2021, NeurIPS.

[16]  Jakub W. Pachocki,et al.  Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.

[17]  Greg Wayne,et al.  Hierarchical motor control in mammals and machines , 2019, Nature Communications.

[18]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[19]  Marcin Andrychowicz,et al.  Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.

[20]  Surya Ganguli,et al.  A deep learning framework for neuroscience , 2019, Nature Neuroscience.

[21]  Stephen Clark,et al.  Emergent Systematic Generalization in a Situated Agent , 2019, ICLR 2020.

[22]  George Konidaris,et al.  On the necessity of abstraction , 2019, Current Opinion in Behavioral Sciences.

[23]  Anthony M. Zador,et al.  A critique of pure learning and what artificial neural networks can learn from animal brains , 2019, Nature Communications.

[24]  Alyssa A. Carey,et al.  Reward revaluation biases hippocampal replay content away from the preferred outcome , 2019, Nature Neuroscience.

[25]  Andrew R. Mitz,et al.  Subcortical Substrates of Explore-Exploit Decisions in Primates , 2019, Neuron.

[26]  Maneesh Sahani,et al.  A neurally plausible model learns successor representations in partially observable environments , 2019, NeurIPS.

[27]  Marc G. Bellemare,et al.  DeepMDP: Learning Continuous Latent Space Models for Representation Learning , 2019, ICML.

[28]  Alexander Lerchner,et al.  COBRA: Data-Efficient Model-Based RL through Unsupervised Object Discovery and Curiosity-Driven Exploration , 2019, ArXiv.

[29]  Jane X. Wang,et al.  Reinforcement Learning, Fast and Slow , 2019, Trends in Cognitive Sciences.

[30]  Radoslaw Martin Cichy,et al.  Deep Neural Networks as Scientific Models , 2019, Trends in Cognitive Sciences.

[31]  C. Olah,et al.  Activation Atlas , 2019, Distill.

[32]  James C. R. Whittington,et al.  Theories of Error Back-Propagation in the Brain , 2019, Trends in Cognitive Sciences.

[33]  Doina Precup,et al.  The Termination Critic , 2019, AISTATS.

[34]  Marc G. Bellemare,et al.  A Comparative Analysis of Expected and Distributional Reinforcement Learning , 2019, AAAI.

[35]  Zeb Kurth-Nelson,et al.  Causal Reasoning from Meta-reinforcement Learning , 2019, ArXiv.

[36]  Tom Eccles,et al.  An investigation of model-free planning , 2019, ICML.

[37]  Marc G. Bellemare,et al.  An Atari Model Zoo for Analyzing, Visualizing, and Comparing Deep Reinforcement Learning Agents , 2018, IJCAI.

[38]  Taehoon Kim,et al.  Quantifying Generalization in Reinforcement Learning , 2018, ICML.

[39]  H. Francis Song,et al.  Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[40]  Yan Wu,et al.  Optimizing agent behavior over long time scales by transporting value , 2018, Nature Communications.

[41]  Nicolas Heess,et al.  Hierarchical visuomotor control of humanoids , 2018, ICLR.

[42]  Amos J. Storkey,et al.  Exploration by Random Network Distillation , 2018, ICLR.

[43]  H. Francis Song,et al.  Relational Forward Models for Multi-Agent Learning , 2018, ICLR.

[44]  Guy Lever,et al.  Human-level performance in 3D multiplayer games with population-based reinforcement learning , 2018, Science.

[45]  Demis Hassabis,et al.  A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[46]  Yoshua Bengio,et al.  Dendritic cortical microcircuits approximate the backpropagation algorithm , 2018, NeurIPS.

[47]  Zeb Kurth-Nelson,et al.  What Is a Cognitive Map? Organizing Knowledge for Flexible Behavior , 2018, Neuron.

[48]  Razvan Pascanu,et al.  Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[49]  Zeb Kurth-Nelson,et al.  Been There, Done That: Meta-Learning with Episodic Recall , 2018, ICML.

[50]  Razvan Pascanu,et al.  Vector-based navigation using grid-like representations in artificial agents , 2018, Nature.

[51]  Daniel L. K. Yamins,et al.  A Task-Optimized Neural Network Replicates Human Auditory Behavior, Predicts Brain Responses, and Reveals a Cortical Processing Hierarchy , 2018, Neuron.

[52]  Nathalie L Rochefort,et al.  Action and learning shape the activity of neuronal circuits in the visual cortex , 2018, Current Opinion in Neurobiology.

[53]  Samy Bengio,et al.  A Study on Overfitting in Deep Reinforcement Learning , 2018, ArXiv.

[54]  Satinder Singh,et al.  On Learning Intrinsic Rewards for Policy Gradient Methods , 2018, NeurIPS.

[55]  Joel Z. Leibo,et al.  Prefrontal cortex as a meta-reinforcement learning system , 2018, bioRxiv.

[56]  S. Gershman Deconstructing the human algorithms for exploration , 2018, Cognition.

[57]  Joel Z. Leibo,et al.  Unsupervised Predictive Memory in a Goal-Directed Agent , 2018, ArXiv.

[58]  Jürgen Schmidhuber,et al.  World Models , 2018, ArXiv.

[59]  H. Francis Song,et al.  Machine Theory of Mind , 2018, ICML.

[60]  Marco Baroni,et al.  Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks , 2017, ICML.

[61]  Doina Precup,et al.  When Waiting is not an Option : Learning Options with a Deliberation Cost , 2017, AAAI.

[62]  Sergey Levine,et al.  Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[63]  Demis Hassabis,et al.  Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[64]  Marcelo G Mattar,et al.  Prioritized memory access explains planning and hippocampal replay , 2017, Nature Neuroscience.

[65]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[66]  N. Uchida,et al.  Neural Circuitry of Reward Prediction Error. , 2017, Annual review of neuroscience.

[67]  Jonathan D. Cohen,et al.  Toward a Rational and Mechanistic Account of Mental Effort. , 2017, Annual review of neuroscience.

[68]  Christopher Burgess,et al.  DARLA: Improving Zero-Shot Transfer in Reinforcement Learning , 2017, ICML.

[69]  D. Hassabis,et al.  Neuroscience-Inspired Artificial Intelligence , 2017, Neuron.

[70]  Marc G. Bellemare,et al.  A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[71]  Yee Whye Teh,et al.  Distral: Robust multitask reinforcement learning , 2017, NIPS.

[72]  Ari Weinstein,et al.  Structure Learning in Motor Control: A Deep Reinforcement Learning Model , 2017, CogSci.

[73]  Chethan Pandarinath,et al.  Inferring single-trial neural population dynamics using sequential auto-encoders , 2017, Nature Methods.

[74]  Kimberly L. Stachenfeld,et al.  The hippocampus as a predictive map , 2017, Nature Neuroscience.

[75]  K. Norman,et al.  Reinstated episodic context guides sampling-based decisions for reward , 2017, Nature Neuroscience.

[76]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[77]  Razvan Pascanu,et al.  Metacontrol for Adaptive Imagination-Based Optimization , 2017, ICLR.

[78]  Tom Schaul,et al.  FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[79]  Marlos C. Machado,et al.  A Laplacian Framework for Option Discovery in Reinforcement Learning , 2017, ICML.

[80]  Joel Z. Leibo,et al.  Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[81]  N. Daw,et al.  Reinforcement Learning and Episodic Memory in Humans and Animals: An Integrative Framework , 2017, Annual review of psychology.

[82]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[83]  Zeb Kurth-Nelson,et al.  Learning to reinforcement learn , 2016, CogSci.

[84]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[85]  Misha Denil,et al.  Learning to Perform Physics Experiments via Deep Reinforcement Learning , 2016, ICLR.

[86]  Doina Precup,et al.  The Option-Critic Architecture , 2016, AAAI.

[87]  Tom Schaul,et al.  Successor Features for Transfer in Reinforcement Learning , 2016, NIPS.

[88]  Marcel van Gerven,et al.  Modeling the Dynamics of Human Brain Activity with Recurrent Neural Networks , 2016, Front. Comput. Neurosci..

[89]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[90]  P. Dayan,et al.  Adaptive integration of habits into depth-limited planning defines a habitual-goal–directed spectrum , 2016, Proceedings of the National Academy of Sciences.

[91]  Yuval Tassa,et al.  Learning and Transfer of Modulated Locomotor Controllers , 2016, ArXiv.

[92]  Rafal Bogacz,et al.  Learning Reward Uncertainty in the Basal Ganglia , 2016, PLoS Comput. Biol..

[93]  Xiao-Jing Wang,et al.  Reward-based training of recurrent neural networks for cognitive and value-based tasks , 2016, bioRxiv.

[94]  James L. McClelland,et al.  What Learning Systems do Intelligent Agents Need? Complementary Learning Systems Theory Updated , 2016, Trends in Cognitive Sciences.

[95]  Timothy E. J. Behrens,et al.  Organizing conceptual knowledge in humans with a gridlike code , 2016, Science.

[96]  Joel Z. Leibo,et al.  Model-Free Episodic Control , 2016, ArXiv.

[97]  Konrad P. Körding,et al.  Toward an Integration of Deep Learning and Neuroscience , 2016, bioRxiv.

[98]  Samuel Gershman,et al.  Deep Successor Reinforcement Learning , 2016, ArXiv.

[99]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[100]  Christopher D. Harvey,et al.  Recurrent Network Models of Sequence Generation and Memory , 2016, Neuron.

[101]  J. DiCarlo,et al.  Using goal-driven deep learning models to understand sensory cortex , 2016, Nature Neuroscience.

[102]  Benjamin Van Roy,et al.  Deep Exploration via Bootstrapped DQN , 2016, NIPS.

[103]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[104]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[105]  Nikolaus Kriegeskorte,et al.  Deep neural networks: a new framework for modelling biological vision and brain information processing , 2015, bioRxiv.

[106]  F. Cushman,et al.  Habitual control of goal selection in humans , 2015, Proceedings of the National Academy of Sciences.

[107]  Alec Solway,et al.  Reinforcement learning, efficient coding, and the statistics of natural tasks , 2015, Current Opinion in Behavioral Sciences.

[108]  Matthew T. Kaufman,et al.  A neural network that finds a naturalistic solution for the production of muscle activity , 2015, Nature Neuroscience.

[109]  G. Schoenbaum,et al.  What the orbitofrontal cortex does not do , 2015, Nature Neuroscience.

[110]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[111]  Christopher H Chatham,et al.  Multiple gates on working memory , 2015, Current Opinion in Behavioral Sciences.

[112]  Jonathan D. Cohen,et al.  Humans use directed and random exploration to solve the explore-exploit dilemma. , 2014, Journal of experimental psychology. General.

[113]  Jonathan D. Cohen,et al.  The Computational and Neural Basis of Cognitive Control: Charted Territory and New Frontiers , 2014, Cogn. Sci..

[114]  Ha Hong,et al.  Performance-optimized hierarchical models predict neural responses in higher visual cortex , 2014, Proceedings of the National Academy of Sciences.

[115]  Robert C. Wilson,et al.  Orbitofrontal Cortex as a Cognitive Map of Task Space , 2014, Neuron.

[116]  Shinsuke Shimojo,et al.  Neural Computations Underlying Arbitration between Model-Based and Model-free Learning , 2013, Neuron.

[117]  N. Daw,et al.  Multiple Systems for Value Learning , 2014 .

[118]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[119]  W. Newsome,et al.  Context-dependent computation by recurrent dynamics in prefrontal cortex , 2013, Nature.

[120]  P. Dayan,et al.  Goals and Habits in the Brain , 2013, Neuron.

[121]  Raymond J. Dolan,et al.  Exploration, novelty, surprise, and free energy minimization , 2013, Front. Psychol..

[122]  Brad E. Pfeiffer,et al.  Hippocampal place cell sequences depict future paths to remembered goals , 2013, Nature.

[123]  M. Botvinick,et al.  Neural representations of events arise from temporal community structure , 2013, Nature Neuroscience.

[124]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[125]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[126]  D. Shohamy,et al.  Preference by Association: How Memory Mechanisms in the Hippocampus Bias Decisions , 2012, Science.

[127]  H. Seo,et al.  Neural basis of reinforcement learning and decision making. , 2012, Annual review of neuroscience.

[128]  Anne G E Collins,et al.  How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis , 2012, The European journal of neuroscience.

[129]  George Konidaris,et al.  Value Function Approximation in Reinforcement Learning Using the Fourier Basis , 2011, AAAI.

[130]  P. Dayan,et al.  Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.

[131]  P. Glimcher Understanding dopamine and reinforcement learning: The dopamine reward prediction error hypothesis , 2011, Proceedings of the National Academy of Sciences.

[132]  Simon Hong,et al.  A pallidus-habenula-dopamine pathway signals inferred stimulus values. , 2010, Journal of neurophysiology.

[133]  Lee Spector,et al.  Genetic Programming for Reward Function Search , 2010, IEEE Transactions on Autonomous Mental Development.

[134]  Richard L. Lewis,et al.  Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective , 2010, IEEE Transactions on Autonomous Mental Development.

[135]  P. Dayan,et al.  States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning , 2010, Neuron.

[136]  Matthijs A. A. van der Meer,et al.  Hippocampal Replay Is Not a Simple Function of Experience , 2010, Neuron.

[137]  B. Balleine,et al.  Human and Rodent Homologies in Action Control: Corticostriatal Determinants of Goal-Directed and Habitual Action , 2010, Neuropsychopharmacology.

[138]  D. Blei,et al.  Context, learning, and extinction. , 2010, Psychological review.

[139]  M. Botvinick,et al.  Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective , 2009, Cognition.

[140]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[141]  Y. Niv Reinforcement learning in the brain , 2009 .

[142]  Geoffrey E. Hinton,et al.  Deep, Narrow Sigmoid Belief Networks Are Universal Approximators , 2008, Neural Computation.

[143]  Andre Cohen,et al.  An object-oriented representation for efficient reinforcement learning , 2008, ICML '08.

[144]  David Badre,et al.  Cognitive control, hierarchy, and the rostro–caudal organization of the frontal lobes , 2008, Trends in Cognitive Sciences.

[145]  D. Heeger,et al.  A Hierarchy of Temporal Receptive Windows in Human Cortex , 2008, The Journal of Neuroscience.

[146]  Peter Dayan,et al.  Hippocampal Contributions to Control: The Third Way , 2007, NIPS.

[147]  Sridhar Mahadevan,et al.  Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes , 2007, J. Mach. Learn. Res..

[148]  Pierre-Yves Oudeyer,et al.  Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[149]  C. Padoa-Schioppa,et al.  Neurons in the orbitofrontal cortex encode economic value , 2006, Nature.

[150]  M. Frank,et al.  Anatomy of a decision: striato-orbitofrontal interactions in reinforcement learning, decision making, and reversal. , 2006, Psychological review.

[151]  Michael J. Frank,et al.  Making Working Memory Work: A Computational Model of Learning in the Prefrontal Cortex and Basal Ganglia , 2006, Neural Computation.

[152]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[153]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[154]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[155]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[156]  P. Dayan,et al.  Reward, Motivation, and Reinforcement Learning , 2002, Neuron.

[157]  David J. Freedman,et al.  Categorical representation of visual stimuli in the primate prefrontal cortex. , 2001, Science.

[158]  H. Stanley,et al.  Optimizing the success of random searches , 1999, Nature.

[159]  H. Eichenbaum,et al.  The Hippocampus, Memory, and Place Cells Is It Spatial Memory or a Memory Space? , 1999, Neuron.

[160]  Rajesh P. N. Rao,et al.  Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. , 1999 .

[161]  Pieter R. Roelfsema,et al.  Object-based attention in the primary visual cortex of the macaque monkey , 1998, Nature.

[162]  B. Balleine,et al.  Goal-directed instrumental action: contingency and incentive learning and their cortical substrates , 1998, Neuropharmacology.

[163]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[164]  B. McNaughton,et al.  Reactivation of hippocampal ensemble memories during sleep. , 1994, Science.

[165]  Deborah Silver,et al.  Feature Visualization , 1994, Scientific Visualization.

[166]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[167]  Jürgen Schmidhuber,et al.  Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[168]  Long Ji Lin,et al.  Programming Robots Using Reinforcement Learning and Teaching , 1991, AAAI.

[169]  David Zipser,et al.  Recurrent Network Model of the Neural Mechanism of Short-Term Active Memory , 1991, Neural Computation.

[170]  Richard A. Andersen,et al.  A back-propagation programmed network that simulates response properties of a subset of posterior parietal neurons , 1988, Nature.

[171]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[172]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[173]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[174]  D. Hubel,et al.  Receptive fields of single neurones in the cat's striate cortex , 1959, The Journal of physiology.

[175]  F. Attneave,et al.  The Organization of Behavior: A Neuropsychological Theory , 1949 .