Modelling Individual Differences in the Form of Pavlovian Conditioned Approach Responses: A Dual Learning Systems Approach with Factored Representations

Reinforcement Learning has greatly influenced models of conditioning, providing powerful explanations of acquired behaviour and underlying physiological observations. However, in recent autoshaping experiments in rats, variation in the form of Pavlovian conditioned responses (CRs) and associated dopamine activity, have questioned the classical hypothesis that phasic dopamine activity corresponds to a reward prediction error-like signal arising from a classical Model-Free system, necessary for Pavlovian conditioning. Over the course of Pavlovian conditioning using food as the unconditioned stimulus (US), some rats (sign-trackers) come to approach and engage the conditioned stimulus (CS) itself – a lever – more and more avidly, whereas other rats (goal-trackers) learn to approach the location of food delivery upon CS presentation. Importantly, although both sign-trackers and goal-trackers learn the CS-US association equally well, only in sign-trackers does phasic dopamine activity show classical reward prediction error-like bursts. Furthermore, neither the acquisition nor the expression of a goal-tracking CR is dopamine-dependent. Here we present a computational model that can account for such individual variations. We show that a combination of a Model-Based system and a revised Model-Free system can account for the development of distinct CRs in rats. Moreover, we show that revising a classical Model-Free system to individually process stimuli by using factored representations can explain why classical dopaminergic patterns may be observed for some rats and not for others depending on the CR they develop. In addition, the model can account for other behavioural and pharmacological results obtained using the same, or similar, autoshaping procedures. Finally, the model makes it possible to draw a set of experimental predictions that may be verified in a modified experimental protocol. We suggest that further investigation of factored representations in computational neuroscience studies may be useful.

[1]  B. Balleine,et al.  Lesions of dorsolateral striatum preserve outcome expectancy but disrupt habit formation in instrumental learning , 2004, The European journal of neuroscience.

[2]  H. Yin,et al.  The role of the basal ganglia in habit formation , 2006, Nature Reviews Neuroscience.

[3]  Christopher M. Vigorito,et al.  Autonomous Hierarchical Skill Acquisition in Factored MDPs , 2008 .

[4]  B. Balleine,et al.  The role of the dorsomedial striatum in instrumental conditioning , 2005, The European journal of neuroscience.

[5]  Huda Akil,et al.  Individual differences in the attribution of incentive salience to reward-related cues: Implications for addiction , 2009, Neuropharmacology.

[6]  Raymond J. Dolan,et al.  Disentangling the Roles of Approach, Activation and Valence in Instrumental and Pavlovian Responding , 2011, PLoS Comput. Biol..

[7]  K. Berridge,et al.  Which cue to ‘want’? Opioid stimulation of central amygdala makes goal-trackers show stronger goal-tracking, just as sign-trackers show stronger sign-tracking , 2012, Behavioural Brain Research.

[8]  Kenji Doya,et al.  Scaled free-energy based reinforcement learning for robust and efficient learning in high-dimensional state spaces , 2013, Front. Neurorobot..

[9]  Jadin C. Jackson,et al.  Reconciling reinforcement learning models with behavioral extinction and renewal: implications for addiction, relapse, and problem gambling. , 2007, Psychological review.

[10]  Y. Niv,et al.  Ventral Striatum and Orbitofrontal Cortex Are Both Required for Model-Based, But Not Model-Free, Reinforcement Learning , 2011, The Journal of Neuroscience.

[11]  K. Berridge The debate over dopamine’s role in reward: the case for incentive salience , 2007, Psychopharmacology.

[12]  M. Khamassi,et al.  Integrating cortico-limbic-basal ganglia architectures for learning model-based and model-free navigation strategies , 2012, Front. Behav. Neurosci..

[13]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14]  W. Schultz,et al.  Influence of Reward Delays on Responses of Dopamine Neurons , 2008, The Journal of Neuroscience.

[15]  Raymond J. Dolan,et al.  Go and no-go learning in reward and punishment: Interactions between affect and effect , 2012, NeuroImage.

[16]  Craig Boutilier,et al.  Stochastic dynamic programming with factored representations , 2000, Artif. Intell..

[17]  T. Robinson,et al.  Quantifying Individual Variation in the Propensity to Attribute Incentive Salience to Reward Cues , 2012, PloS one.

[18]  T. Robinson,et al.  A food predictive cue must be attributed with incentive salience for it to induce c-fos mRNA expression in cortico-striatal-thalamic brain regions , 2011, Neuroscience.

[19]  P. Redgrave,et al.  Testing computational hypotheses of brain systems function: a case study with the basal ganglia , 2004, Network.

[20]  T. Robinson,et al.  Dissociating the Predictive and Incentive Motivational Properties of Reward-Related Cues Through the Study of Individual Differences , 2009, Biological Psychiatry.

[21]  P. Dayan,et al.  Actions , Policies , Values , and the Basal Ganglia , 2005 .

[22]  Peter Dayan,et al.  Bonsai Trees in Your Head: How the Pavlovian System Sculpts Goal-Directed Choices by Pruning Decision Trees , 2012, PLoS Comput. Biol..

[23]  G. Schoenbaum,et al.  Model‐based learning and the contribution of the orbitofrontal cortex to the model‐free world , 2012, The European journal of neuroscience.

[24]  C. Gallistel,et al.  Time, rate, and conditioning. , 2000, Psychological review.

[25]  M. Roesch,et al.  Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards , 2007, Nature Neuroscience.

[26]  L. Kamin Predictability, surprise, attention, and conditioning , 1967 .

[27]  N. Schmajuk,et al.  Latent inhibition: a neural network approach. , 1996, Journal of experimental psychology. Animal behavior processes.

[28]  P. Dayan,et al.  States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning , 2010, Neuron.

[29]  Peter Redgrave,et al.  A computational model of action selection in the basal ganglia. I. A new functional anatomy , 2001, Biological Cybernetics.

[30]  K. Berridge,et al.  Which Cue to “Want?” Central Amygdala Opioid Activation Enhances and Focuses Incentive Salience on a Prepotent Reward Cue , 2009, The Journal of Neuroscience.

[31]  P. Balsam,et al.  Intertrial interval and unconditioned stimulus durations in autoshaping , 1979 .

[32]  A. Redish,et al.  Addiction as a Computational Process Gone Awry , 2004, Science.

[33]  M. Khamassi,et al.  Dopaminergic Control of the Exploration-Exploitation Trade-Off via the Basal Ganglia , 2012, Front. Neurosci..

[34]  P. Phillips,et al.  Pavlovian valuation systems in learning and decision making , 2012, Current Opinion in Neurobiology.

[35]  Steven C Stout,et al.  Sometimes-competing retrieval (SOCR): a formalization of the comparator hypothesis. , 2007, Psychological review.

[36]  A. Barto,et al.  Adaptive Critics and the Basal Ganglia , 1994 .

[37]  W. Schultz,et al.  Discrete Coding of Reward Probability and Uncertainty by Dopamine Neurons , 2003, Science.

[38]  Christian Balkenius,et al.  Dynamics of a Classical Conditioning Model , 1998, Auton. Robots.

[39]  D. R. Sparta,et al.  Lever conditioned stimulus-directed autoshaping induced by saccharin-ethanol unconditioned stimulus solution: effects of ethanol concentration and trial spacing. , 2003, Alcohol.

[40]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[41]  P. Fletcher,et al.  Faculty Opinions recommendation of A selective role for dopamine in stimulus-reward learning. , 2011 .

[42]  K. Berridge,et al.  Instant Transformation of Learned Repulsion into Motivational “Wanting” , 2013, Current Biology.

[43]  Mitsuo Kawato,et al.  Multiple Model-Based Reinforcement Learning , 2002, Neural Computation.

[44]  W. Newsome,et al.  The temporal precision of reward prediction in dopamine neurons , 2008, Nature Neuroscience.

[45]  T. Robinson,et al.  Rats prone to attribute incentive salience to reward cues are also prone to impulsive action , 2011, Behavioural Brain Research.

[46]  B. Balleine,et al.  Double Dissociation of Basolateral and Central Amygdala Lesions on the General and Outcome-Specific Forms of Pavlovian-Instrumental Transfer , 2005, The Journal of Neuroscience.

[47]  B. Everitt,et al.  Emotion and motivation: the role of the amygdala, ventral striatum, and prefrontal cortex , 2002, Neuroscience & Biobehavioral Reviews.

[48]  Aaron C. Courville,et al.  Bayesian theories of conditioning in a changing world , 2006, Trends in Cognitive Sciences.

[49]  Sadahiko Nakajima,et al.  Overexpectation in appetitive Pavlovian and instrumental conditioning , 1998 .

[50]  P. Dayan,et al.  Reward, Motivation, and Reinforcement Learning , 2002, Neuron.

[51]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[52]  Amir Dezfouli,et al.  Speed/Accuracy Trade-Off between the Habitual and the Goal-Directed Processes , 2011, PLoS Comput. Biol..

[53]  B A Williams,et al.  Conditioned Reinforcement: Experimental and Theoretical Issues , 1994, The Behavior analyst.

[54]  Alec Solway,et al.  Goal-directed decision making as probabilistic inference: a computational framework and potential neural correlates. , 2012, Psychological review.

[55]  L. Panlilio,et al.  Blocking of conditioning to a cocaine-paired stimulus: Testing the hypothesis that cocaine perpetually produces a signal of larger-than-expected reward , 2007, Pharmacology Biochemistry and Behavior.

[56]  A. Cooper,et al.  Predictive Reward Signal of Dopamine Neurons , 2011 .

[57]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[58]  B. Balleine,et al.  Reward‐guided learning beyond dopamine in the nucleus accumbens: the integrative functions of cortico‐basal ganglia networks , 2008, The European journal of neuroscience.

[59]  David S. Touretzky,et al.  Representation and Timing in Theories of the Dopamine System , 2006, Neural Computation.

[60]  Peter Dayan,et al.  Non-commercial Research and Educational Use including without Limitation Use in Instruction at Your Institution, Sending It to Specific Colleagues That You Know, and Providing a Copy to Your Institution's Administrator. All Other Uses, Reproduction and Distribution, including without Limitation Comm , 2022 .

[61]  A. Tomie,et al.  Pairings of lever and food induce Pavlovian conditioned approach of sign-tracking and goal-tracking in C57BL/6 mice , 2012, Behavioural Brain Research.

[62]  Anna M. Lomanowska,et al.  Inadequate early social experience increases the incentive salience of reward-related cues in adulthood , 2011, Behavioural Brain Research.

[63]  Marc Toussaint,et al.  Probabilistic inference for solving discrete and continuous state Markov Decision Processes , 2006, ICML.

[64]  Olivier Sigaud,et al.  Learning the structure of Factored Markov Decision Processes in reinforcement learning problems , 2006, ICML.

[65]  Mehdi Khamassi,et al.  Which Temporal Difference learning algorithm best reproduces dopamine activity in a multi-choice task? , 2012, BMC Neuroscience.

[66]  Mehdi Khamassi,et al.  Combining Self-organizing Maps with Mixtures of Experts: Application to an Actor-Critic Model of Reinforcement Learning in the Basal Ganglia , 2006, SAB.

[67]  J. Mink THE BASAL GANGLIA: FOCUSED SELECTION AND INHIBITION OF COMPETING MOTOR PROGRAMS , 1996, Progress in Neurobiology.

[68]  T. Robinson,et al.  Individual differences in the propensity to approach signals vs goals promote different adaptations in the dopamine system of rats , 2007, Psychopharmacology.

[69]  Matthijs A. A. van der Meer,et al.  Information Processing in Decision-Making Systems , 2012, The Neuroscientist : a review journal bringing neurobiology, neurology and psychiatry.

[70]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[71]  G. Davey,et al.  Autoshaping in the rat: The effects of localizable visual and auditory signals for food. , 1983, Journal of the experimental analysis of behavior.

[72]  A. Graybiel,et al.  Differential Dynamics of Activity Changes in Dorsolateral and Dorsomedial Striatal Loops during Learning , 2010, Neuron.

[73]  A. Graybiel Habits, rituals, and the evaluative brain. , 2008, Annual review of neuroscience.

[74]  T. Robinson,et al.  The role of dopamine in the accumbens core in the expression of Pavlovian‐conditioned responses , 2012, The European journal of neuroscience.

[75]  O. Hikosaka Models of information processing in the basal Ganglia edited by James C. Houk, Joel L. Davis and David G. Beiser, The MIT Press, 1995. $60.00 (400 pp) ISBN 0 262 08234 9 , 1995, Trends in Neurosciences.

[76]  G. Elmer,et al.  Disruption of conditioned reward association by typical and atypical antipsychotics , 2010, Pharmacology Biochemistry and Behavior.

[77]  P. Dayan,et al.  Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.

[78]  P. Redgrave,et al.  The basal ganglia: a vertebrate solution to the selection problem? , 1999, Neuroscience.

[79]  E. Vaadia,et al.  Midbrain dopamine neurons encode decisions for future action , 2006, Nature Neuroscience.

[80]  M. Roesch,et al.  The Orbitofrontal Cortex and Ventral Tegmental Area Are Necessary for Learning from Unexpected Outcomes , 2009, Neuron.

[81]  N. Daw,et al.  Multiplicity of control in the basal ganglia: computational roles of striatal subregions , 2011, Current Opinion in Neurobiology.

[82]  Y. Niv,et al.  Exploring a latent cause theory of classical conditioning , 2012, Learning & Behavior.

[83]  B. Skinner,et al.  The Behavior of Organisms: An Experimental Analysis , 2016 .

[84]  Dylan A. Simon,et al.  Dual-System Learning Models and Drugs of Abuse , 2012 .

[85]  Stéphane Doncieux,et al.  Sferesv2: Evolvin' in the multi-core world , 2010, IEEE Congress on Evolutionary Computation.