Solving the distal reward problem through linkage of STDP and dopamine signaling

In Pavlovian and instrumental conditioning, reward typically comes seconds after reward-triggering actions, creating an explanatory conundrum known as "distal reward problem": How does the brain know what firing patterns of what neurons are responsible for the reward if 1) the patterns are no longer there when the reward arrives and 2) all neurons and synapses are active during the waiting period to the reward? Here, we show how the conundrum is resolved by a model network of cortical spiking neurons with spike-timing-dependent plasticity (STDP) modulated by dopamine (DA). Although STDP is triggered by nearly coincident firing patterns on a millisecond timescale, slow kinetics of subsequent synaptic plasticity is sensitive to changes in the extracellular DA concentration during the critical period of a few seconds. Random firings during the waiting period to the reward do not affect STDP and hence make the network insensitive to the ongoing activity-the key feature that distinguishes our approach from previous theoretical studies, which implicitly assume that the network be quiet during the waiting period or that the patterns be preserved until the reward arrives. This study emphasizes the importance of precise firing patterns in brain dynamics and suggests how a global diffusive reinforcement signal in the form of extracellular DA can selectively influence the right synapses at the right time.

[1]  L. S. Kogan Review of Principles of Behavior. , 1943 .

[2]  B. Skinner,et al.  Principles of Behavior , 1944 .

[3]  Marvin Minsky,et al.  Steps toward Artificial Intelligence , 1995, Proceedings of the IRE.

[4]  E. Fischer Conditioned Reflexes , 1942, American journal of physical medicine.

[5]  W. Levy,et al.  Temporal contiguity requirements for long-term associative potentiation/depression in the hippocampus , 1983, Neuroscience.

[6]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[7]  E. G. Jones Cerebral Cortex , 1987, Cerebral Cortex.

[8]  J. Lisman,et al.  A mechanism for the Hebb and the anti-Hebb processes underlying learning and memory. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[9]  B. Connors,et al.  Intrinsic firing patterns of diverse neocortical neurons , 1990, Trends in Neurosciences.

[10]  R. Wightman,et al.  Control of dopamine extracellular concentration in rat striatum by impulse flow and uptake , 1990, Brain Research Reviews.

[11]  U. Frey,et al.  Dopaminergic antagonists prevent long-term maintenance of posttetanic LTP in the CA1 region of rat hippocampal slices , 1990, Brain Research.

[12]  H. Swadlow Efferent neurons and suspected interneurons in S-1 forelimb representation of the awake rabbit: receptive fields and axonal properties. , 1990, Journal of neurophysiology.

[13]  W. Schultz,et al.  Responses of monkey dopamine neurons during learning of behavioral reactions. , 1992, Journal of neurophysiology.

[14]  Joel L. Davis,et al.  A Model of How the Basal Ganglia Generate and Use Neural Signals That Predict Reinforcement , 1994 .

[15]  H. Swadlow Efferent neurons and suspected interneurons in motor cortex of the awake rabbit: axonal properties, sensory receptive fields, and subthreshold synaptic inputs. , 1994, Journal of neurophysiology.

[16]  P. Garris,et al.  Efflux of dopamine from the synaptic cleft in the nucleus accumbens of the rat brain , 1994, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[17]  G. Gerhardt,et al.  In Vivo Assessment of Dopamine Uptake in Rat Medial Prefrontal Cortex: Comparison with Dorsal Striatum and Nucleus Accumbens , 1995, Journal of neurochemistry.

[18]  Joel L. Davis,et al.  In : Models of Information Processing in the Basal Ganglia , 2008 .

[19]  J. Lisman,et al.  D1/D5 Dopamine Receptor Activation Increases the Magnitude of Early Long-Term Potentiation at CA1 Hippocampal Synapses , 1996, The Journal of Neuroscience.

[20]  Wulfram Gerstner,et al.  A neuronal learning rule for sub-millisecond temporal coding , 1996, Nature.

[21]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[22]  E. Villacres,et al.  Induction of CRE-Mediated Gene Expression by Stimuli That Generate Long-Lasting LTP in Area CA1 of the Hippocampus , 1996, Neuron.

[23]  Masataka Watanabe Reward expectancy in primate prefrental neurons , 1996, Nature.

[24]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[25]  D. Lovinger,et al.  Decreased probability of neurotransmitter release underlies striatal long-term depression and postnatal development of corticostriatal synapses. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[26]  H. Markram,et al.  Regulation of Synaptic Efficacy by Coincidence of Postsynaptic APs and EPSPs , 1997, Science.

[27]  U. Frey,et al.  Synaptic tagging and long-term potentiation , 1997, Nature.

[28]  J. Lisman,et al.  D1/D5 Dopamine Receptors Inhibit Depotentiation at CA1 Synapses via cAMP-Dependent Mechanism , 1998, The Journal of Neuroscience.

[29]  Wulfram Gerstner,et al.  Spike-Based Compared to Rate-Based Hebbian Learning , 1998, NIPS.

[30]  E. Kandel,et al.  Rolipram, a type IV-specific phosphodiesterase inhibitor, facilitates the establishment of long-lasting long-term potentiation and improves memory. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[31]  G. Bi,et al.  Synaptic Modifications in Cultured Hippocampal Neurons: Dependence on Spike Timing, Synaptic Strength, and Postsynaptic Cell Type , 1998, The Journal of Neuroscience.

[32]  R. Kempter,et al.  Hebbian learning and spiking neurons , 1999 .

[33]  Charles R. Yang,et al.  Medial prefrontal cortical output neurons to the ventral tegmental area (VTA) and their responses to burst‐patterned stimulation of the VTA: Neuroanatomical and in vivo electrophysiological analyses , 1999, Synapse.

[34]  P. Calabresi,et al.  Unilateral dopamine denervation blocks corticostriatal LTP. , 1999, Journal of neurophysiology.

[35]  T. Jay,et al.  Essential Role of D1 But Not D2 Receptors in the NMDA Receptor-Dependent Long-Term Potentiation at Hippocampal-Prefrontal Cortex Synapses In Vivo , 2000, The Journal of Neuroscience.

[36]  L. Abbott,et al.  Competitive Hebbian learning through spike-timing-dependent synaptic plasticity , 2000, Nature Neuroscience.

[37]  P. Greengard,et al.  Dopamine and cAMP-Regulated Phosphoprotein 32 kDa Controls Both Striatal Long-Term Depression and Long-Term Potentiation, Opposing Forms of Synaptic Plasticity , 2000, The Journal of Neuroscience.

[38]  Peter Dayan,et al.  Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems , 2001 .

[39]  Rajesh P. N. Rao,et al.  Spike-Timing-Dependent Hebbian Plasticity as Temporal Difference Learning , 2001, Neural Computation.

[40]  Roland E. Suri,et al.  Temporal Difference Model Reproduces Anticipatory Neural Activity , 2001, Neural Computation.

[41]  O. Hikosaka,et al.  A neural correlate of response bias in monkey caudate nucleus , 2002, Nature.

[42]  W. Schultz Getting Formal with Dopamine and Reward , 2002, Neuron.

[43]  H. Seung,et al.  Learning in Spiking Neural Networks by Reinforcement of Stochastic Synaptic Transmission , 2003, Neuron.

[44]  F. Crépel,et al.  Dopaminergic modulation of long-term synaptic plasticity in rat prefrontal neurons. , 2003, Cerebral cortex.

[45]  J. Seamans,et al.  The principal features and mechanisms of dopamine modulation in the prefrontal cortex , 2004, Progress in Neurobiology.

[46]  P. Montague,et al.  Dynamic Gain Control of Dopamine Delivery in Freely Moving Animals , 2004, The Journal of Neuroscience.

[47]  G. Edelman,et al.  Spike-timing dynamics of neuronal groups. , 2004, Cerebral cortex.

[48]  W. Pan,et al.  Dopamine Cells Respond to Predicted Events during Classical Conditioning: Evidence for Eligibility Traces in the Reward-Learning Network , 2005, The Journal of Neuroscience.

[49]  Florentin Wörgötter,et al.  Temporal Sequence Learning, Prediction, and Control: A Review of Different Models and Their Relation to Biological Mechanisms , 2005, Neural Computation.

[50]  M. Hasselmo,et al.  An integrate-and-fire model of prefrontal cortex neuronal activity during performance of goal-directed decision making. , 2005, Cerebral cortex.

[51]  Michael E. Hasselmo,et al.  A Model of Prefrontal Cortical Mechanisms for Goal-directed Behavior , 2005, Journal of Cognitive Neuroscience.

[52]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[53]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[54]  L. Abbott,et al.  Cascade Models of Synaptically Stored Memories , 2005, Neuron.

[55]  L. Abbott,et al.  Extending the effects of spike-timing-dependent plasticity to behavioral timescales. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[56]  Eugene M. Izhikevich,et al.  Polychronization: Computation with Spikes , 2006, Neural Computation.

[57]  Wolfram Schultz,et al.  Reward , 1927, Scholarpedia.

[58]  Wolfram Schultz,et al.  Reward signals , 2007, Scholarpedia.