Attention-Gated Reinforcement Learning of Internal Representations for Classification

Animal learning is associated with changes in the efficacy of connections between neurons. The rules that govern this plasticity can be tested in neural networks. Rules that train neural networks to map stimuli onto outputs are given by supervised learning and reinforcement learning theories. Supervised learning is efficient but biologically implausible. In contrast, reinforcement learning is biologically plausible but comparatively inefficient. It lacks a mechanism that can identify units at early processing levels that play a decisive role in the stimulus-response mapping. Here we show that this so-called credit assignment problem can be solved by a new role for attention in learning. There are two factors in our new learning scheme that determine synaptic plasticity: (1) a reinforcement signal that is homogeneous across the network and depends on the amount of reward obtained after a trial, and (2) an attentional feedback signal from the output layer that limits plasticity to those units at earlier processing levels that are crucial for the stimulus-response mapping. The new scheme is called attention-gated reinforcement learning (AGREL). We show that it is as efficient as supervised learning in classification tasks. AGREL is biologically realistic and integrates the role of feedback connections, attention effects, synaptic plasticity, and reinforcement learning signals into a coherent framework.

[1]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[2]  P. Anandan,et al.  Pattern-recognizing stochastic learning automata , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[3]  A G Barto,et al.  Learning by statistical cooperation of self-interested neuron-like computing elements. , 1985, Human neurobiology.

[4]  David Zipser,et al.  Feature Discovery by Competive Learning , 1986, Cogn. Sci..

[5]  R. Desimone,et al.  Selective attention gates visual processing in the extrastriate cortex. , 1985, Science.

[6]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[7]  W. Singer,et al.  Modulation of visual cortical plasticity by acetylcholine and noradrenaline , 1986, Nature.

[8]  R. Malinow,et al.  Postsynaptic hyperpolarization during conditioning reversibly blocks induction of long-term potentiation , 1986, Nature.

[9]  M. Goldberg,et al.  Visuospatial and motor attention in the monkey , 1987, Neuropsychologia.

[10]  T. Bliss,et al.  NMDA receptors - their role in long-term potentiation , 1987, Trends in Neurosciences.

[11]  H. Wigström,et al.  Physiological mechanisms underlying long-term potentiation , 1988, Trends in Neurosciences.

[12]  Terrence J. Sejnowski,et al.  Analysis of hidden units in a layered network trained to classify sonar targets , 1988, Neural Networks.

[13]  G. Lynch,et al.  Contributions of quisqualate and NMDA receptors to the induction and expression of LTP. , 1988, Science.

[14]  Francis Crick,et al.  The recent excitement about neural networks , 1989, Nature.

[15]  N. Daw,et al.  The effect of varying stimulus intensity on NMDA-receptor activity in cat visual cortex. , 1990, Journal of neurophysiology.

[16]  G. Orban,et al.  How well do response changes of striate neurons signal differences in orientation: a study in the discriminating monkey , 1990, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[17]  Michael I. Jordan,et al.  A more biologically plausible learning rule for neural networks. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[18]  P. Goldman-Rakic,et al.  Preface: Cerebral Cortex Has Come of Age , 1991 .

[19]  S. Juliano,et al.  Cholinergic depletion prevents expansion of topographic maps in somatosensory cortex. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[20]  D. J. Felleman,et al.  Distributed hierarchical processing in the primate cerebral cortex. , 1991, Cerebral cortex.

[21]  Geoffrey E. Hinton,et al.  Self-organizing neural network that discovers surfaces in random-dot stereograms , 1992, Nature.

[22]  W. Schultz,et al.  Responses of monkey dopamine neurons during learning of behavioral reactions. , 1992, Journal of neurophysiology.

[23]  Arjen van Ooyen,et al.  Improving the convergence of the back-propagation algorithm , 1992, Neural Networks.

[24]  Jeffrey D. Schall,et al.  Neural basis of saccade target selection in frontal eye field during visual search , 1993, Nature.

[25]  S. Hochstein,et al.  Attentional control of early perceptual learning. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[26]  B. C. Motter Focal attention produces spatially selective processing in visual cortical areas V1, V2, and V4 in the presence of competing stimuli. , 1993, Journal of neurophysiology.

[27]  John Duncan,et al.  A neural basis for visual search in inferior temporal cortex , 1993, Nature.

[28]  David Zipser,et al.  The neurobiological significance of the new learning models , 1993 .

[29]  K Tanaka,et al.  Neuronal mechanisms of object recognition. , 1993, Science.

[30]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[31]  G. Rizzolatti,et al.  Space and selective attention , 1994 .

[32]  M. Moscovitch,et al.  Attention and Performance 15: Conscious and Nonconscious Information Processing , 1994 .

[33]  Gerald Tesauro,et al.  Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[34]  R. Desimone,et al.  Neural mechanisms of selective visual attention. , 1995, Annual review of neuroscience.

[35]  F. Gage,et al.  Essential role of neocortical acetylcholine in spatial memory , 1995, Nature.

[36]  Geoffrey E. Hinton,et al.  The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[37]  T. Sejnowski,et al.  A selection model for motion processing in area MT of primates , 1995, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[38]  C. Koch,et al.  Recurrent excitation in neocortical circuits , 1995, Science.

[39]  Peter Dayan,et al.  Bee foraging in uncertain environments using predictive hebbian learning , 1995, Nature.

[40]  J. Hoffman,et al.  The role of visual attention in saccadic eye movements , 1995, Perception & psychophysics.

[41]  P A Salin,et al.  Corticocortical connections in the visual system: structure and function. , 1995, Physiological reviews.

[42]  C. Pennartz The ascending neuromodulatory systems in learning by reinforcement: comparing computational conjectures with experimental findings , 1995, Brain Research Reviews.

[43]  B. Dosher,et al.  The role of attention in the programming of saccades , 1995, Vision Research.

[44]  J. Bakin,et al.  Induction of a physiological memory in the cerebral cortex by stimulation of the nucleus basalis. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[45]  J. Maunsell,et al.  Attentional modulation of visual motion processing in cortical areas MT and MST , 1996, Nature.

[46]  M. Goldberg,et al.  Visual, presaccadic, and cognitive activation of single neurons in monkey lateral intraparietal area. , 1996, Journal of neurophysiology.

[47]  H. Deubel,et al.  Saccade target selection and object recognition: Evidence for a common attentional mechanism , 1996, Vision Research.

[48]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[49]  Randall C. O'Reilly,et al.  Biologically Plausible Error-Driven Learning Using Local Activation Differences: The Generalized Recirculation Algorithm , 1996, Neural Computation.

[50]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[51]  R. Desimone,et al.  Neural mechanisms of spatial selective attention in areas V1, V2, and V4 of macaque visual cortex. , 1997, Journal of neurophysiology.

[52]  M. Kilgard,et al.  Cortical map reorganization enabled by nucleus basalis activity. , 1998, Science.

[53]  Pieter R. Roelfsema,et al.  Object-based attention in the primary visual cortex of the macaque monkey , 1998, Nature.

[54]  M. Goldberg,et al.  The representation of visual salience in monkey parietal cortex , 1998, Nature.

[55]  J. Tanji,et al.  Involvement of NMDA and non-NMDA receptors in the neuronal responses of the primary motor cortex to input from the supplementary motor area and somatosensory cortex: studies of task-performing monkeys. , 1998, The Japanese journal of physiology.

[56]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[57]  P. Garris,et al.  Dissociation of dopamine release in the nucleus accumbens from intracranial self-stimulation , 1999, Nature.

[58]  Kenji Kawano,et al.  Global and fine information coded by single neurons in the temporal visual cortex , 1999, Nature.

[59]  Stefan Treue,et al.  Feature-based attention influences motion processing gain in macaque visual cortex , 1999, Nature.

[60]  Carrie J. McAdams,et al.  Effects of Attention on Orientation-Tuning Functions of Single Neurons in Macaque Cortical Area V4 , 1999, The Journal of Neuroscience.

[61]  T Moore,et al.  Shape representations and visual guidance of saccadic eye movements. , 1999, Science.

[62]  George D. Magoulas,et al.  Improving the Convergence of the Backpropagation Algorithm Using Learning Rate Adaptation Methods , 1999, Neural Computation.

[63]  J. Desce,et al.  Dopamine Receptors and Groups I and II mGluRs Cooperate for Long-Term Depression Induction in Rat Prefrontal Cortex through Converging Postsynaptic Activation of MAP Kinases , 1999, The Journal of Neuroscience.

[64]  K. Svoboda,et al.  Rapid spine delivery and redistribution of AMPA receptors after synaptic NMDA receptor activation. , 1999, Science.

[65]  A. Dickinson,et al.  Neuronal coding of prediction errors. , 2000, Annual review of neuroscience.

[66]  V. Lamme,et al.  The distinct modes of vision offered by feedforward and recurrent processing , 2000, Trends in Neurosciences.

[67]  T. Jay,et al.  Essential Role of D1 But Not D2 Receptors in the NMDA Receptor-Dependent Long-Term Potentiation at Hippocampal-Prefrontal Cortex Synapses In Vivo , 2000, The Journal of Neuroscience.

[68]  R. Desimone,et al.  Attention Increases Sensitivity of V4 Neurons , 2000, Neuron.

[69]  Peter L. Bartlett,et al.  Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[70]  James L. McClelland,et al.  The time course of perceptual choice: the leaky, competing accumulator model. , 2001, Psychological review.

[71]  G. Orban,et al.  Practising orientation identification improves orientation coding in V1 neurons , 2001, Nature.

[72]  J. Wickens,et al.  A cellular mechanism of reward-related learning , 2001, Nature.

[73]  M. Merzenich,et al.  Cortical remodelling induced by activity of ventral tegmental dopamine neurons , 2001, Nature.

[74]  W. Schultz,et al.  Dopamine responses comply with basic assumptions of formal learning theory , 2001, Nature.

[75]  David J. Freedman,et al.  Categorical representation of visual stimuli in the primate prefrontal cortex. , 2001, Science.

[76]  J. Gold,et al.  Neural computations that underlie decisions about sensory stimuli , 2001, Trends in Cognitive Sciences.

[77]  Roland E. Suri,et al.  Temporal Difference Model Reproduces Anticipatory Neural Activity , 2001, Neural Computation.

[78]  N. Sigala,et al.  Visual categorization shapes feature selectivity in the primate temporal cortex , 2002, Nature.

[79]  A. Grace,et al.  Dopamine-mediated modulation of odour-evoked amygdala potentials during pavlovian conditioning , 2002, Nature.

[80]  W. Schultz Getting Formal with Dopamine and Reward , 2002, Neuron.

[81]  P. Dayan,et al.  Reward, Motivation, and Reinforcement Learning , 2002, Neuron.

[82]  E. Seidemann,et al.  Dynamics of Depolarization and Hyperpolarization in the Frontal Cortex and Saccade Goal , 2002, Science.

[83]  D. Gaffan,et al.  Unilateral lesions of the cholinergic basal forebrain and fornix in one hemisphere and inferior temporal cortex in the opposite hemisphere produce severe learning impairments in rhesus monkeys. , 2002, Cerebral cortex.

[84]  P. Montague,et al.  Neural Economics and the Biological Substrates of Valuation , 2002, Neuron.

[85]  M. Behrmann,et al.  Impact of learning on representation of parts and wholes in monkey inferotemporal cortex , 2002, Nature Neuroscience.

[86]  Katherine M. Armstrong,et al.  Selective gating of visual signals by microstimulation of frontal cortex , 2003, Nature.

[87]  Malcolm W. Brown,et al.  Cholinergic Neurotransmission Is Essential for Perirhinal Cortical Plasticity and Recognition Memory , 2003, Neuron.

[88]  J. Changeux,et al.  A neuronal network model linking subjective reports and objective physiological data during conscious perception , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[89]  R. Wightman,et al.  Subsecond dopamine release promotes cocaine seeking , 2003, Nature.

[90]  W. Schultz,et al.  Discrete Coding of Reward Probability and Uncertainty by Dopamine Neurons , 2003, Science.

[91]  H. Seung,et al.  Learning in Spiking Neural Networks by Reinforcement of Stochastic Synaptic Transmission , 2003, Neuron.

[92]  David J. Freedman,et al.  A Comparison of Primate Prefrontal and Inferior Temporal Cortices during Visual Categorization , 2003, The Journal of Neuroscience.

[93]  Victor A. F. Lamme,et al.  Synchrony and covariation of firing rates in the primary visual cortex during contour grouping , 2004, Nature Neuroscience.

[94]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[95]  B. Fischer,et al.  Saccadic reaction times and activation of the prelunate cortex: Parallel observations in trained rhesus monkeys , 2004, Experimental Brain Research.

[96]  Konrad P. Körding,et al.  Supervised and Unsupervised Learning with Two Sites of Synaptic Integration , 2001, Journal of Computational Neuroscience.

[97]  H. Spekreijse,et al.  Correspondence of presaccadic activity in the monkey primary visual cortex with saccadic eye movements. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[98]  E. Vaadia,et al.  Coincident but Distinct Messages of Midbrain Dopamine and Striatal Tonically Active Neurons , 2004, Neuron.

[99]  Gustavo Deco,et al.  A neuronal model for the shaping of feature selectivity in IT by visual categorization , 2005, Neurocomputing.

[100]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[101]  Gustavo Deco,et al.  Learning to Attend: Modeling the Shaping of Selectivity in Infero-temporal Cortex in a Categorization Task , 2006, Biological Cybernetics.

[102]  P. Roelfsema,et al.  Envisioning the Reward , 2006, Neuron.