How Attention Can Create Synaptic Tags for the Learning of Working Memories in Sequential Tasks

Intelligence is our ability to learn appropriate responses to new stimuli and situations. Neurons in association cortex are thought to be essential for this ability. During learning these neurons become tuned to relevant features and start to represent them with persistent activity during memory delays. This learning process is not well understood. Here we develop a biologically plausible learning scheme that explains how trial-and-error learning induces neuronal selectivity and working memory representations for task-relevant information. We propose that the response selection stage sends attentional feedback signals to earlier processing levels, forming synaptic tags at those connections responsible for the stimulus-response mapping. Globally released neuromodulators then interact with tagged synapses to determine their plasticity. The resulting learning rule endows neural networks with the capacity to create new working memory representations of task relevant information as persistent activity. It is remarkably generic: it explains how association neurons learn to store task-relevant information for linear as well as non-linear stimulus-response mappings, how they become tuned to category boundaries or analog variables, depending on the task demands, and how they learn to integrate probabilistic evidence for perceptual decisions.

[1]  M. Hasselmo,et al.  Mechanism of Graded Persistent Cellular Activity of Entorhinal Cortex Layer V Neurons , 2006, Neuron.

[2]  J. Gold,et al.  The neural basis of decision making. , 2007, Annual review of neuroscience.

[3]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[4]  Julien Vitay,et al.  ANNarchy: a code generation approach to neural simulations on parallel hardware , 2015, Front. Neuroinform..

[5]  Wulfram Gerstner,et al.  Reinforcement Learning Using a Continuous Time Actor-Critic Framework with Spiking Neurons , 2013, PLoS Comput. Biol..

[6]  D. Gaffan,et al.  Unilateral lesions of the cholinergic basal forebrain and fornix in one hemisphere and inferior temporal cortex in the opposite hemisphere produce severe learning impairments in rhesus monkeys. , 2002, Cerebral cortex.

[7]  E. Callaway,et al.  Parallel processing strategies of the primate visual system , 2009, Nature Reviews Neuroscience.

[8]  Jonathan D. Cohen,et al.  Computational roles for dopamine in behavioural control , 2004, Nature.

[9]  W. Schultz,et al.  Learning of sequential movements by neural network model with dopamine-like reinforcement signal , 1998, Experimental Brain Research.

[10]  Jonathan D. Cohen,et al.  Learning to Use Working Memory in Partially Observable Environments through Dopaminergic Reinforcement , 2008, NIPS.

[11]  Chris Eliasmith,et al.  Solving the Problem of Negative Synaptic Weights in Cortical Models , 2008, Neural Computation.

[12]  R. Guillery,et al.  On the actions that one nerve cell can have on another: distinguishing "drivers" from "modulators". , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Andrew W. Moore,et al.  Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[14]  David J. Freedman,et al.  Categorical representation of visual stimuli in the primate prefrontal cortex. , 2001, Science.

[15]  Seth A. Herd,et al.  The Leabra Cognitive Architecture: How to Play 20 Principles with Nature and Win! , 2012 .

[16]  Karel Svoboda,et al.  Long-Range Neuronal Circuits Underlying the Interaction between Sensory and Motor Cortex , 2011, Neuron.

[17]  David J. Freedman,et al.  Experience-dependent representation of visual categories in parietal cortex , 2006, Nature.

[18]  Pieter R. Roelfsema,et al.  Attention-Gated Reinforcement Learning of Internal Representations for Classification , 2005, Neural Computation.

[19]  R. Romo,et al.  Correlated Neuronal Discharges that Increase Coding Efficiency during Perceptual Discrimination , 2003, Neuron.

[20]  P. Dayan,et al.  Reward, Motivation, and Reinforcement Learning , 2002, Neuron.

[21]  Tatiana A. Engel,et al.  Same or Different? A Neural Circuit Mechanism of Similarity-Based Pattern Match Decision Making , 2011, The Journal of Neuroscience.

[22]  K. Gurney,et al.  A Physiologically Plausible Model of Action Selection and Oscillatory Activity in the Basal Ganglia , 2006, The Journal of Neuroscience.

[23]  Xiao-Jing Wang,et al.  Synaptic computation underlying probabilistic inference , 2010, Nature Neuroscience.

[24]  Xiao-Jing Wang,et al.  The importance of mixed selectivity in complex cognitive tasks , 2013, Nature.

[25]  Pieter R. Roelfsema,et al.  Neurally Plausible Reinforcement Learning of Working Memory Tasks , 2012, NIPS.

[26]  E. Vaadia,et al.  Midbrain dopamine neurons encode decisions for future action , 2006, Nature Neuroscience.

[27]  Paul Miller,et al.  Inhibitory control by an integral feedback signal in prefrontal cortex: a model of discrimination between sequential stimuli. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[28]  J. Wallis,et al.  Dynamic Encoding of Responses and Outcomes by Neurons in Medial Prefrontal Cortex , 2009, The Journal of Neuroscience.

[29]  S. Sherman,et al.  Synaptic Properties of Corticocortical Connections between the Primary and Secondary Visual Cortical Areas in the Mouse , 2011, The Journal of Neuroscience.

[30]  L. Abbott,et al.  Cascade Models of Synaptically Stored Memories , 2005, Neuron.

[31]  Keiji Tanaka,et al.  Neuronal Correlates of Goal-Based Motor Selection in the Prefrontal Cortex , 2003, Science.

[32]  Kae Nakamura,et al.  Basal ganglia orient eyes to reward. , 2006, Journal of neurophysiology.

[33]  K. H. Britten,et al.  Neuronal correlates of a perceptual decision , 1989, Nature.

[34]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[35]  C. Eliasmith,et al.  Learning to Select Actions with Spiking Neurons in the Basal Ganglia , 2012, Front. Neurosci..

[36]  G. Orban,et al.  Practising orientation identification improves orientation coding in V1 neurons , 2001, Nature.

[37]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[38]  H. Deubel,et al.  Saccade target selection and object recognition: Evidence for a common attentional mechanism , 1996, Vision Research.

[39]  S. Hochstein,et al.  Attentional control of early perceptual learning. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[40]  M. Hasselmo,et al.  Graded persistent activity in entorhinal cortex neurons , 2002, Nature.

[41]  Pieter R. Roelfsema,et al.  Continuous-time on-policy neural Reinforcement Learning of working memory tasks , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[42]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[43]  Kai A. Krueger,et al.  Flexible shaping: How learning in small steps helps , 2009, Cognition.

[44]  Wolfgang Maass,et al.  Emergence of complex computational structures from chaotic neural networks through reward-modulated Hebbian learning. , 2014, Cerebral cortex.

[45]  W. Schultz Getting Formal with Dopamine and Reward , 2002, Neuron.

[46]  S. Sajikumar,et al.  Metaplasticity governs compartmentalization of synaptic tagging and capture through brain-derived neurotrophic factor (BDNF) and protein kinase Mζ (PKMζ) , 2011, Proceedings of the National Academy of Sciences.

[47]  Michael J. Frank,et al.  Making Working Memory Work: A Computational Model of Learning in the Prefrontal Cortex and Basal Ganglia , 2006, Neural Computation.

[48]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[49]  Emilio Salinas,et al.  Discrimination in the Sense of Flutter: New Psychophysical Measurements in Monkeys , 1997, The Journal of Neuroscience.

[50]  Walter Senn,et al.  Spatio-Temporal Credit Assignment in Neuronal Population Learning , 2011, PLoS Comput. Biol..

[51]  Jürgen Schmidhuber,et al.  HQ-Learning , 1997, Adapt. Behav..

[52]  Katherine M. Armstrong,et al.  Selective gating of visual signals by microstimulation of frontal cortex , 2003, Nature.

[53]  H. Seung,et al.  Learning in Spiking Neural Networks by Reinforcement of Stochastic Synaptic Transmission , 2003, Neuron.

[54]  E. Izhikevich Solving the distal reward problem through linkage of STDP and dopamine signaling , 2007, BMC Neuroscience.

[55]  M. Goldberg,et al.  Activity of neurons in the lateral intraparietal area of the monkey during an antisaccade task , 1999, Nature Neuroscience.

[56]  R. Romo,et al.  Neuronal Correlates of a Perceptual Decision in Ventral Premotor Cortex , 2004, Neuron.

[57]  David Zipser,et al.  Recurrent Network Model of the Neural Mechanism of Short-Term Active Memory , 1991, Neural Computation.

[58]  F. Ballarini,et al.  Identification of transmitter systems and learning tag molecules involved in behavioral tagging during memory formation , 2011, Proceedings of the National Academy of Sciences.

[59]  L. Abbott,et al.  From fixed points to chaos: Three models of delayed discrimination , 2013, Progress in Neurobiology.

[60]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[61]  James L. McClelland,et al.  The time course of perceptual choice: the leaky, competing accumulator model. , 2001, Psychological review.

[62]  W. Schultz Multiple dopamine functions at different time courses. , 2007, Annual review of neuroscience.

[63]  Markus Diesmann,et al.  An Imperfect Dopaminergic Error Signal Can Drive Temporal-Difference Learning , 2011, PLoS Comput. Biol..

[64]  C. Padoa-Schioppa,et al.  Neurons in the orbitofrontal cortex encode economic value , 2006, Nature.

[65]  Joel L. Davis,et al.  A Model of How the Basal Ganglia Generate and Use Neural Signals That Predict Reinforcement , 1994 .

[66]  Michael N. Shadlen,et al.  Probabilistic reasoning by neurons , 2007, Nature.

[67]  U. Frey,et al.  Synaptic tagging and long-term potentiation , 1997, Nature.

[68]  Xiao-Jing Wang,et al.  Cortico–basal ganglia circuit mechanism for a decision threshold in reaction time tasks , 2006, Nature Neuroscience.

[69]  Minmin Luo,et al.  Dorsal Raphe Neurons Signal Reward through 5-HT and Glutamate , 2014, Neuron.

[70]  A. Koulakov,et al.  Model for a robust neural integrator , 2002, Nature Neuroscience.

[71]  栁下 祥 A critical time window for dopamine actions on the structural plasticity of dendritic spines , 2016 .

[72]  P. Goldman-Rakic,et al.  Mnemonic coding of visual space in the monkey's dorsolateral prefrontal cortex. , 1989, Journal of neurophysiology.

[73]  Matthew W Self,et al.  Different glutamate receptors convey feedforward and recurrent processing in macaque V1 , 2012, Proceedings of the National Academy of Sciences.

[74]  Pieter R. Roelfsema,et al.  Learning resets of neural working memory , 2014, ESANN.

[75]  Ranulfo Romo,et al.  Flexible Control of Mutual Inhibition: A Neural Model of Two-Interval Discrimination , 2005, Science.

[76]  K. Doya,et al.  Representation of Action-Specific Reward Values in the Striatum , 2005, Science.

[77]  M. Kilgard,et al.  Cortical map reorganization enabled by nucleus basalis activity. , 1998, Science.

[78]  J. Duncan The multiple-demand (MD) system of the primate brain: mental programs for intelligent behaviour , 2010, Trends in Cognitive Sciences.

[79]  Gustavo Deco,et al.  Synaptic dynamics and decision making , 2010, Proceedings of the National Academy of Sciences.

[80]  W. Senn,et al.  Reinforcement learning in populations of spiking neurons , 2008, Nature Neuroscience.

[81]  R. Andersen,et al.  Memory related motor planning activity in posterior parietal cortex of macaque , 1988, Experimental Brain Research.

[82]  Takeo Watanabe,et al.  Perceptual learning rules based on reinforcers and attention , 2010, Trends in Cognitive Sciences.

[83]  R. Wurtz,et al.  Frontal eye field sends delay activity related to movement, memory, and vision to the superior colliculus. , 2001, Journal of neurophysiology.

[84]  Emilio Salinas,et al.  Cognitive neuroscience: Flutter Discrimination: neural codes, perception, memory and decision making , 2003, Nature Reviews Neuroscience.

[85]  M. Chun,et al.  Selective attention modulates implicit learning , 2001, The Quarterly journal of experimental psychology. A, Human experimental psychology.

[86]  G. Laurent,et al.  Conditional modulation of spike-timing-dependent plasticity for olfactory learning , 2012, Nature.

[87]  Peter Redgrave,et al.  A computational model of action selection in the basal ganglia. I. A new functional anatomy , 2001, Biological Cybernetics.

[88]  Christopher J. Peck,et al.  The Amygdala and Basal Forebrain as a Pathway for Motivationally Guided Attention , 2014, The Journal of Neuroscience.

[89]  M. Delong,et al.  Nucleus basalis of Meynert neuronal activity during a delayed response task in monkey , 1986, Brain Research.

[90]  R. O’Reilly,et al.  Computational Explorations in Cognitive Neuroscience: Understanding the Mind by Simulating the Brain , 2000 .

[91]  R. Romo,et al.  Neuronal correlates of parametric working memory in the prefrontal cortex , 1999, Nature.

[92]  J. Wallis Orbitofrontal cortex and its contribution to decision-making. , 2007, Annual review of neuroscience.