Dopamine-Signaled Reward Predictions Generated by Competitive Excitation and Inhibition in a Spiking Neural Network Model

Dopaminergic neurons in the mammalian substantia nigra display characteristic phasic responses to stimuli which reliably predict the receipt of primary rewards. These responses have been suggested to encode reward prediction-errors similar to those used in reinforcement learning. Here, we propose a model of dopaminergic activity in which prediction-error signals are generated by the joint action of short-latency excitation and long-latency inhibition, in a network undergoing dopaminergic neuromodulation of both spike-timing dependent synaptic plasticity and neuronal excitability. In contrast to previous models, sensitivity to recent events is maintained by the selective modification of specific striatal synapses, efferent to cortical neurons exhibiting stimulus-specific, temporally extended activity patterns. Our model shows, in the presence of significant background activity, (i) a shift in dopaminergic response from reward to reward-predicting stimuli, (ii) preservation of a response to unexpected rewards, and (iii) a precisely timed below-baseline dip in activity observed when expected rewards are omitted.

[1]  M Hallett,et al.  Induction of errors in a delayed response task by repetitive transcranial magnetic stimulation of the dorsolateral prefrontal cortex. , 1994, Neuroreport.

[2]  Joshua W. Brown,et al.  How the Basal Ganglia Use Parallel Excitatory and Inhibitory Learning Pathways to Selectively Respond to Unexpected Rewarding Cues , 1999, The Journal of Neuroscience.

[3]  C. Lustig,et al.  Not “just” a coincidence: Frontal‐striatal interactions in working memory and interval timing , 2005, Memory.

[4]  Charles J. Wilson,et al.  Corticostriatal combinatorics: the implications of corticostriatal axonal arborizations. , 2002, Journal of neurophysiology.

[5]  W. Pan,et al.  Dopamine Cells Respond to Predicted Events during Classical Conditioning: Evidence for Eligibility Traces in the Reward-Learning Network , 2005, The Journal of Neuroscience.

[6]  J. Kerr,et al.  Dopamine Receptor Activation Is Required for Corticostriatal Spike-Timing-Dependent Plasticity , 2008, The Journal of Neuroscience.

[7]  T. Sejnowski,et al.  Dopamine-mediated stabilization of delay-period activity in a network model of prefrontal cortex. , 2000, Journal of neurophysiology.

[8]  F. Gonon Prolonged and Extrasynaptic Excitatory Action of Dopamine Mediated by D1 Receptors in the Rat Striatum In Vivo , 1997, The Journal of Neuroscience.

[9]  T. Robbins,et al.  Putting a spin on the dorsal–ventral divide of the striatum , 2004, Trends in Neurosciences.

[10]  Karl J. Friston The free-energy principle: a unified brain theory? , 2010, Nature Reviews Neuroscience.

[11]  Mark D. Humphries,et al.  Capturing Dopaminergic Modulation and Bimodal Membrane Behaviour of Striatal Medium Spiny Neurons in Accurate, Reduced Models , 2009, Frontiers Comput. Neurosci..

[12]  Eugene M. Izhikevich,et al.  Polychronization: Computation with Spikes , 2006, Neural Computation.

[13]  R. Malenka,et al.  Dopaminergic modulation of neuronal excitability in the striatum and nucleus accumbens. , 2000, Annual review of neuroscience.

[14]  J. Penney,et al.  The functional anatomy of basal ganglia disorders , 1989, Trends in Neurosciences.

[15]  A. Nambu,et al.  Functional significance of the cortico–subthalamo–pallidal ‘hyperdirect’ pathway , 2002, Neuroscience Research.

[16]  Henry Markram,et al.  Real-Time Computing Without Stable States: A New Framework for Neural Computation Based on Perturbations , 2002, Neural Computation.

[17]  A. Cooper,et al.  Predictive Reward Signal of Dopamine Neurons , 2011 .

[18]  P. Goldman-Rakic Regional and cellular fractionation of working memory. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Botond Szatmáry,et al.  Spike-Timing Theory of Working Memory , 2010, PLoS Comput. Biol..

[20]  E. Capaldi,et al.  The organization of behavior. , 1992, Journal of applied behavior analysis.

[21]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[22]  Y. Dan,et al.  Spike Timing-Dependent Plasticity of Neural Circuits , 2004, Neuron.

[23]  W. Schultz Multiple dopamine functions at different time courses. , 2007, Annual review of neuroscience.

[24]  W. Schultz,et al.  Dopamine neurons of the monkey midbrain: contingencies of responses to stimuli eliciting immediate behavioral reactions. , 1990, Journal of neurophysiology.

[25]  B. Hyland,et al.  Firing modes of midbrain dopamine cells in the freely moving rat , 2002, Neuroscience.

[26]  G. Williams,et al.  Under the curve: Critical issues for elucidating D1 receptor function in working memory , 2006, Neuroscience.

[27]  S. Bressler,et al.  Granger Causality: Basic Theory and Application to Neuroscience , 2006, q-bio/0608035.

[28]  Jens Timmer,et al.  Handbook of Time Series Analysis , 2006 .

[29]  P. Redgrave,et al.  What is reinforced by phasic dopamine signals? , 2008, Brain Research Reviews.

[30]  W. Schultz,et al.  Responses of monkey midbrain dopamine neurons during delayed alternation performance , 1991, Brain Research.

[31]  T. Stanford,et al.  Subcortical loops through the basal ganglia , 2005, Trends in Neurosciences.

[32]  M. Abeles Local Cortical Circuits: An Electrophysiological Study , 1982 .

[33]  W. Schultz,et al.  Responses of monkey dopamine neurons during learning of behavioral reactions. , 1992, Journal of neurophysiology.

[34]  P. Calabresi,et al.  Short-term and long-term plasticity at corticostriatal synapses: Implications for learning and memory , 2009, Behavioural Brain Research.

[35]  Stassinopoulos,et al.  Democratic reinforcement: A principle for brain function. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[36]  E. Izhikevich Solving the distal reward problem through linkage of STDP and dopamine signaling , 2007, BMC Neuroscience.

[37]  William R. Softky,et al.  The highly irregular firing of cortical cells is inconsistent with temporal integration of random EPSPs , 1993, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[38]  W. Schultz,et al.  Neuronal activity in monkey ventral striatum related to the expectation of reward , 1992, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[39]  W. Pan,et al.  Tripartite Mechanism of Extinction Suggested by Dopamine Neuron Activity and Temporal Difference Model , 2008, The Journal of Neuroscience.

[40]  Thomas E. Hazy,et al.  Neural mechanisms of acquired phasic dopamine responses in learning , 2010, Neuroscience & Biobehavioral Reviews.

[41]  Daniel Durstewitz,et al.  Dynamical basis of irregular spiking in NMDA-driven prefrontal cortex neurons. , 2007, Cerebral cortex.

[42]  D. Bullock,et al.  A Local Circuit Model of Learned Striatal and Dopamine Cell Responses under Probabilistic Schedules of Reward , 2008, The Journal of Neuroscience.

[43]  J. Glowinski,et al.  Bidirectional Activity-Dependent Plasticity at Corticostriatal Synapses , 2005, The Journal of Neuroscience.

[44]  K. Harris Neural signatures of cell assembly organization , 2005, Nature Reviews Neuroscience.

[45]  P. Goldman-Rakic,et al.  Mnemonic coding of visual space in the monkey's dorsolateral prefrontal cortex. , 1989, Journal of neurophysiology.

[46]  P. Greengard,et al.  Dichotomous Dopaminergic Control of Striatal Synaptic Plasticity , 2008, Science.

[47]  Joaquín M. Fuster,et al.  Cortex and Memory: Emergence of a New Paradigm , 2009, Journal of Cognitive Neuroscience.

[48]  Eugene M. Izhikevich,et al.  Simple model of spiking neurons , 2003, IEEE Trans. Neural Networks.