PVLV: the primary value and learned value Pavlovian learning algorithm.

The authors present their primary value learned value (PVLV) model for understanding the reward-predictive firing properties of dopamine (DA) neurons as an alternative to the temporal-differences (TD) algorithm. PVLV is more directly related to underlying biology and is also more robust to variability in the environment. The primary value (PV) system controls performance and learning during primary rewards, whereas the learned value (LV) system learns about conditioned stimuli. The PV system is essentially the Rescorla-Wagner/delta-rule and comprises the neurons in the ventral striatum/nucleus accumbens that inhibit DA cells. The LV system comprises the neurons in the central nucleus of the amygdala that excite DA cells. The authors show that the PVLV model can account for critical aspects of the DA firing data, making a number of clear predictions about lesion effects, several of which are consistent with existing data. For example, first- and second-order conditioning can be anatomically dissociated, which is consistent with PVLV and not TD. Overall, the model provides a biologically plausible framework for understanding the neural basis of reward learning.

[1]  Michael J. Frank,et al.  Hold your horses: A dynamic computational role for the subthalamic nucleus in decision making , 2006, Neural Networks.

[2]  P. Holland,et al.  Substantia nigra pars compacta is critical to both the acquisition and expression of learned orienting of rats , 2006, The European journal of neuroscience.

[3]  M. Frank,et al.  Anatomy of a decision: striato-orbitofrontal interactions in reinforcement learning, decision making, and reversal. , 2006, Psychological review.

[4]  Michael J. Frank,et al.  Making Working Memory Work: A Computational Model of Learning in the Prefrontal Cortex and Basal Ganglia , 2006, Neural Computation.

[5]  M. Roesch,et al.  Orbitofrontal Cortex, Associative Learning, and Expectancies , 2005, Neuron.

[6]  W. Pan,et al.  Dopamine Cells Respond to Predicted Events during Classical Conditioning: Evidence for Eligibility Traces in the Reward-Learning Network , 2005, The Journal of Neuroscience.

[7]  W. Schultz,et al.  Evidence that the delay-period activity of dopamine neurons corresponds to reward uncertainty rather than backpropagating TD errors , 2005, Behavioral and Brain Functions.

[8]  P. Dayan,et al.  Dopamine, uncertainty and TD learning , 2005, Behavioral and Brain Functions.

[9]  P. Holland,et al.  Role of Amygdalo-Nigral Circuitry in Conditioning of a Visual Stimulus Paired with Food , 2005, The Journal of Neuroscience.

[10]  C. Lustig,et al.  Not “just” a coincidence: Frontal‐striatal interactions in working memory and interval timing , 2005, Memory.

[11]  Vanessa McKenna,et al.  Amygdala central nucleus function is necessary for learning, but not expression, of conditioned auditory orienting. , 2005, Behavioral neuroscience.

[12]  B. Balleine,et al.  Double Dissociation of Basolateral and Central Amygdala Lesions on the General and Outcome-Specific Forms of Pavlovian-Instrumental Transfer , 2005, The Journal of Neuroscience.

[13]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[14]  Michael J. Frank,et al.  Dynamic Dopamine Modulation in the Basal Ganglia: A Neurocomputational Account of Cognitive Deficits in Medicated and Nonmedicated Parkinsonism , 2005, Journal of Cognitive Neuroscience.

[15]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[16]  D. Buonomano,et al.  The neural basis of temporal processing. , 2004, Annual review of neuroscience.

[17]  R. Hampson,et al.  Reward, memory and substance abuse: functional neuronal circuits in the nucleus accumbens , 2004, Neuroscience & Biobehavioral Reviews.

[18]  José Luis Contreras-Vidal,et al.  A Predictive Reinforcement Model of Dopamine Neurons for Learning Approach Behavior , 1999, Journal of Computational Neuroscience.

[19]  W. Schultz,et al.  Coding of Predicted Reward Omission by Dopamine Neurons in a Conditioned Inhibition Paradigm , 2003, The Journal of Neuroscience.

[20]  A. Grace,et al.  Afferent modulation of dopamine neuron firing differentially regulates tonic and phasic dopamine transmission , 2003, Nature Neuroscience.

[21]  Wolfram Schultz,et al.  Effects of expectations for different reward magnitudes on neuronal activity in primate striatum. , 2003, Journal of neurophysiology.

[22]  L. Swanson The Amygdala and Its Place in the Cerebral Hemisphere , 2003, Annals of the New York Academy of Sciences.

[23]  A. Phillips,et al.  Independent modulation of basal and feeding-evoked dopamine efflux in the nucleus accumbens and medial prefrontal cortex by the central and basolateral amygdalar nuclei in the rat , 2003, Neuroscience.

[24]  David E. Huber,et al.  Persistence and accommodation in short-term priming and other perceptual paradigms: temporal segregation through synaptic depression , 2003, Cogn. Sci..

[25]  Michael J. Frank,et al.  Transitivity, flexibility, conjunctive representations, and the hippocampus. II. A computational analysis , 2003, Hippocampus.

[26]  Michael Van Elzakker,et al.  Transitivity, flexibility, conjunctive representations, and the hippocampus. I. An empirical analysis , 2003, Hippocampus.

[27]  P. Dayan,et al.  Reward, Motivation, and Reinforcement Learning , 2002, Neuron.

[28]  W. Schultz Getting Formal with Dopamine and Reward , 2002, Neuron.

[29]  Olaf Sporns,et al.  Neuromodulation and plasticity in an autonomous robot , 2002, Neural Networks.

[30]  Eytan Ruppin,et al.  Actor-critic models of the basal ganglia: new anatomical and computational perspectives , 2002, Neural Networks.

[31]  Peter Dayan,et al.  Dopamine: generalization and bonuses , 2002, Neural Networks.

[32]  Sham M. Kakade,et al.  Opponent interactions between serotonin and dopamine , 2002, Neural Networks.

[33]  B. Everitt,et al.  Emotion and motivation: the role of the amygdala, ventral striatum, and prefrontal cortex , 2002, Neuroscience & Biobehavioral Reviews.

[34]  W. Regehr,et al.  Short-term synaptic plasticity. , 2002, Annual review of physiology.

[35]  T. Robbins,et al.  Effects of selective excitotoxic lesions of the nucleus accumbens core, anterior cingulate cortex, and central nucleus of the amygdala on autoshaping performance in rats. , 2002, Behavioral neuroscience.

[36]  David S. Touretzky,et al.  Timing and Partial Observability in the Dopamine System , 2002, NIPS.

[37]  S. Kakade,et al.  Acquisition and extinction in autoshaping. , 2002, Psychological review.

[38]  W. Schultz,et al.  Dopamine responses comply with basic assumptions of formal learning theory , 2001, Nature.

[39]  Randall C. O'Reilly,et al.  Generalization in Interactive Networks: The Benefits of Inhibitory Competition and Hebbian Learning , 2001, Neural Computation.

[40]  Michael J. Frank,et al.  Interactions between frontal cortex and basal ganglia in working memory: A computational model , 2001, Cognitive, affective & behavioral neuroscience.

[41]  J. Dinsmoor Stimuli inevitably generated by behavior that avoids electric shock are inherently reinforcing. , 2001, Journal of the experimental analysis of behavior.

[42]  R. O’Reilly,et al.  Conjunctive representations in learning and memory: principles of cortical and hippocampal function. , 2001, Psychological review.

[43]  Roland E. Suri,et al.  Temporal Difference Model Reproduces Anticipatory Neural Activity , 2001, Neural Computation.

[44]  Peter Dayan,et al.  Motivated Reinforcement Learning , 2001, NIPS.

[45]  J. Disterhoft,et al.  Cortical involvement in acquisition and extinction of trace eyeblink conditioning. , 2000, Behavioral neuroscience.

[46]  R. O’Reilly,et al.  Computational Explorations in Cognitive Neuroscience: Understanding the Mind by Simulating the Brain , 2000 .

[47]  S. Haber,et al.  The central nucleus of the amygdala projection to dopamine subpopulations in primates , 2000, Neuroscience.

[48]  Kimberly S. Kirkpatrick,et al.  Stimulus and temporal cues in classical conditioning. , 2000, Journal of experimental psychology. Animal behavior processes.

[49]  J. Price,et al.  Effects of excitotoxic lesions in the ventral striatopallidal–thalamocortical pathway on odor reversal learning: inability to extinguish an incorrect response , 2000, Experimental Brain Research.

[50]  K. Hikosaka,et al.  Delay activity of orbital and lateral prefrontal neurons of the monkey varying with different rewards. , 2000, Cerebral cortex.

[51]  D. Joel,et al.  The connections of the dopaminergic system with the striatum in rats and primates: an analysis with respect to the functional and compartmental organization of the striatum , 2000, Neuroscience.

[52]  T. Robbins,et al.  Dissociable roles of the central and basolateral amygdala in appetitive emotional learning , 2000, The European journal of neuroscience.

[53]  Joshua W. Brown,et al.  How the Basal Ganglia Use Parallel Excitatory and Inhibitory Learning Pathways to Selectively Respond to Unexpected Rewarding Cues , 1999, The Journal of Neuroscience.

[54]  W. Schultz,et al.  A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task , 1999, Neuroscience.

[55]  W. Schultz,et al.  Relative reward preference in primate orbitofrontal cortex , 1999, Nature.

[56]  Ralph R. Miller,et al.  Time as content in Pavlovian conditioning , 1998, Behavioural Processes.

[57]  John F. Disterhoft,et al.  Lesions of the Caudal Area of Rabbit Medial Prefrontal Cortex Impair Trace Eyeblink Conditioning , 1998, Neurobiology of Learning and Memory.

[58]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[59]  T. Robbins,et al.  Different types of fear-conditioned behaviour mediated by separate nuclei within amygdala , 1997, Nature.

[60]  H. Eichenbaum,et al.  The hippocampus and memory for orderly stimulus relations. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[61]  L. Abbott,et al.  Synaptic Depression and Cortical Gain Control , 1997, Science.

[62]  R. Ivry The representation of temporal information in perception and motor control , 1996, Current Opinion in Neurobiology.

[63]  H. Markram,et al.  Redistribution of synaptic efficacy between neocortical pyramidal neurons , 1996, Nature.

[64]  P. Holland,et al.  Neurotoxic Lesions of Basolateral, But Not Central, Amygdala Interfere with Pavlovian Second-Order Conditioning and Reinforcer Devaluation Effects , 1996, The Journal of Neuroscience.

[65]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[66]  A. S. Freeman,et al.  Effects of electrical stimulation of the central nucleus of the amygdala on the in vivo electrophysiological activity of rat nigral dopaminergic neurons , 1995, Synapse.

[67]  A comparison of “configural” discrimination problems: Implications for understanding the role of the hippocampal formation in learning and memory , 1995, Psychobiology.

[68]  Hisao Nishijo,et al.  Amygdala role in conditioned associative learning , 1995, Progress in Neurobiology.

[69]  Joel L. Davis,et al.  In : Models of Information Processing in the Basal Ganglia , 2008 .

[70]  A. Barto Adaptive Critics and the Basal Ganglia , 1995 .

[71]  Michael Davis,et al.  Neurotransmission in the rat amygdala related to fear and anxiety , 1994, Trends in Neurosciences.

[72]  Joel L. Davis,et al.  A Model of How the Basal Ganglia Generate and Use Neural Signals That Predict Reinforcement , 1994 .

[73]  W. Schultz,et al.  Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task , 1993, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[74]  W. Schultz,et al.  Neuronal activity in monkey ventral striatum related to the expectation of reward , 1992, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[75]  H. Fibiger,et al.  Afferent connections of the laterodorsal and the pedunculopontine tegmental nuclei in the rat: A retro‐ and antero‐grade transport and immunohistochemical study , 1992, The Journal of comparative neurology.

[76]  T. Gray,et al.  Organization of amygdaloid projections to brainstem dopaminergic, noradrenergic, and adrenergic cell groups in the rat , 1992, Brain Research Bulletin.

[77]  John P. Aggleton,et al.  The amygdala: Neurobiological aspects of emotion, memory, and mental dysfunction. , 1992 .

[78]  Glutamate-immunoreactive neurons of the central amygdaloid nucleus projecting to the subretrofacial nucleus of SHR and WKY rats: A double-labeling study , 1991, Neuroscience Letters.

[79]  T. Robbins,et al.  Effects of dopamine depletion from the caudate-putamen and nucleus accumbens septi on the acquisition and performance of a conditional discrimination task , 1990, Behavioural Brain Research.

[80]  Bernard Widrow,et al.  Adaptive switching circuits , 1988 .

[81]  K. Nakamura,et al.  Hypothalamic neuron involvement in integration of reward, aversion, and cue signals. , 1986, Journal of neurophysiology.

[82]  K. Nakamura,et al.  Lateral hypothalamus neuron involvement in integration of natural and artificial rewards and cue signals. , 1986, Journal of neurophysiology.

[83]  M. M. Patterson,et al.  Fimbrial lesions and sensory preconditioning. , 1984, Behavioral neuroscience.

[84]  A G Barto,et al.  Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.

[85]  R. Rescorla,et al.  Associations in second-order conditioning and sensory preconditioning. , 1972, Journal of comparative and physiological psychology.

[86]  R. Rescorla,et al.  A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .

[87]  H. Davis,et al.  Fixed and Variable Duration Warning Stimuli and Conditioned Suppression , 1969 .

[88]  L. Kamin Attention-like processes in classical conditioning , 1967 .

[89]  E. Fischer Conditioned Reflexes , 1942, American journal of physical medicine.

[90]  Stanley C. Ratner,et al.  Comparative psychology : research in animal behavior , 1964 .

[91]  L. Kamin Apparent adaptation effects in the acquisition of a conditioned emotional response. , 1961, Canadian journal of psychology.

[92]  L. Kamin Acquisition of avoidance with a variable CS-US INTERVAL. , 1960, Canadian journal of psychology.