A theory of actions and habits: The interaction of rate correlation and contiguity systems in free-operant behavior.

Contemporary theories of instrumental performance assume that responding can be controlled by two behavioral systems, one goal-directed that encodes the outcome of an action, and one habitual that reinforces the response strength of the same action. Here we present a model of free-operant behavior in which goal-directed control is determined by the correlation between the rates of the action and the outcome whereas the total prediction error generated by contiguous reinforcement by the outcome controls habitual response strength. The outputs of these two systems summate to generate a total response strength. This cooperative model addresses the difference in the behavioral impact of ratio and interval schedules, the transition from goal-directed to habitual control with extended training, the persistence of goal-directed control under choice procedures and following extinction, among other phenomena. In these respects, this dual-system model is unique in its account of free-operant behavior.

[1]  P. Holland Relations between Pavlovian-instrumental transfer and reinforcer devaluation. , 2004, Journal of experimental psychology. Animal behavior processes.

[2]  B. Balleine,et al.  A specific role for posterior dorsolateral striatum in human habit learning , 2009, The European journal of neuroscience.

[3]  G. Foxall What is Intentionality , 2007 .

[4]  R. Rescorla Preservation of response-outcome associations through extinction , 1993 .

[5]  Drazen Prelec,et al.  Matching, maximizing, and the hyperbolic reinforcement feedback function. , 1982 .

[6]  B. Balleine,et al.  Motivational control of goal-directed action , 1994 .

[7]  A. Dickinson,et al.  Motivational Control of Instrumental Performance: The Role of Prior Experience of The Reinforcer , 1988 .

[8]  Wouter Kool,et al.  Cost-Benefit Arbitration Between Multiple Reinforcement-Learning Systems , 2017, Psychological science.

[9]  N. Dubin Mathematical Model , 2022 .

[10]  Christopher D. Adams,et al.  Instrumental Responding following Reinforcer Devaluation , 1981 .

[11]  Fabian A. Soto,et al.  Evidence for a dissociation between causal beliefs and instrumental actions , 2018, Quarterly journal of experimental psychology.

[12]  R. Costa,et al.  Frontiers in Integrative Neuroscience Integrative Neuroscience , 2022 .

[13]  Edgar H Vogel,et al.  Quantitative models of Pavlovian conditioning , 2004, Brain Research Bulletin.

[14]  A. Dickinson,et al.  Omission Learning after Instrumental Pretraining , 1998 .

[15]  G. Morrison,et al.  Taste-mediated conditioned aversion to an exteroceptive stimulus following LiCl poisoning. , 1974, Journal of comparative and physiological psychology.

[16]  A. Dickinson Actions and habits: the development of behavioural autonomy , 1985 .

[17]  B. Balleine,et al.  Effects of Repeated Cocaine Exposure on Habit Learning and Reversal by N-Acetylcysteine , 2014, Neuropsychopharmacology.

[18]  E. Holman Some conditions for the dissociation of consummatory and instrumental behavior in rats , 1975 .

[19]  N. Daw,et al.  Model-based learning protects against forming habits , 2015, Cognitive, Affective, & Behavioral Neuroscience.

[20]  Christopher D. Adams,et al.  The Effect of the Instrumental Training Contingency on Susceptibility to Reinforcer Devaluation , 1983 .

[21]  Goal-directed control on interval schedules does not depend on the action-outcome correlation. , 2019, Journal of experimental psychology. Animal learning and cognition.

[22]  Bernard W. Balleine,et al.  Actions, Action Sequences and Habits: Evidence That Goal-Directed and Habitual Action Control Are Hierarchically Organized , 2013, PLoS Comput. Biol..

[23]  A. Dickinson,et al.  Actions and Habits: Psychological Issues in Dual-System Theory , 2018 .

[24]  W. Brown Animal Intelligence: Experimental Studies , 1912, Nature.

[25]  Fabian A. Soto,et al.  Human instrumental performance in ratio and interval contingencies: A challenge for associative theory , 2019, Quarterly journal of experimental psychology.

[26]  A. Dickinson,et al.  Safety signals as instrumental reinforcers during free-operant avoidance , 2014, Learning & memory.

[27]  S. Hanson,et al.  Arousal: its genesis and manifestation as response rate. , 1978, Psychological review.

[28]  N. Daw,et al.  The ubiquity of model-based reinforcement learning , 2012, Current Opinion in Neurobiology.

[29]  A. Dickinson Associative learning and animal cognition , 2012, Philosophical Transactions of the Royal Society B: Biological Sciences.

[30]  A. Dickinson,et al.  Incentive Learning and the Motivational Control of Instrumental Performance , 1989 .

[31]  N. Mackintosh The psychology of animal learning , 1974 .

[32]  Herbert Berg Method and theory in the study of Islamic origins , 2003 .

[33]  R. Costa,et al.  Orbitofrontal and striatal circuits dynamically encode the shift between goal-directed and habitual actions , 2013, Nature Communications.

[34]  Christopher D. Adams Variations in the Sensitivity of Instrumental Responding to Reinforcer Devaluation , 1982 .

[35]  A. Dickinson Instrumental Conditioning , 2020, Encyclopedia of Evolutionary Psychological Science.

[36]  J. C. Johnston,et al.  A cognitive theory of avoidance learning. , 1973 .

[37]  A. Silberberg,et al.  Primacy of interresponse-time reinforcement in accounting for rate differences under variable-ratio and variable-interval schedules. , 1984 .

[38]  T. Robbins,et al.  An Associative Account of Avoidance , 2016 .

[39]  T. J. Matthews,et al.  Yoked variable-ratio and variable-interval responding in pigeons. , 1977, Journal of the experimental analysis of behavior.

[40]  B. Balleine,et al.  Bidirectional Instrumental Conditioning , 1996, The Quarterly journal of experimental psychology. B, Comparative and physiological psychology.

[41]  A. Dickinson,et al.  The Intentionality of Animal Action , 1990 .

[42]  M. Bouton,et al.  Contextual control of instrumental actions and habits. , 2015, Journal of experimental psychology. Animal learning and cognition.

[43]  B. Balleine,et al.  Hierarchical Action Control: Adaptive Collaboration Between Actions and Habits , 2019, Front. Psychol..

[44]  A. Dickinson,et al.  Instrumental judgment and performance under variations in action-outcome contingency and contiguity , 1991, Memory & cognition.

[45]  R. Jewkes,et al.  Perceptions and Experiences of Research Participants on Gender-Based Violence Community Based Survey: Implications for Ethical Guidelines , 2012, PloS one.

[46]  R. Weisman,et al.  Positive conditioned reinforcement of Sidman avoidance behavior in rats. , 1969 .

[47]  R. Herrnstein Method and theory in the study of avoidance. , 1969, Psychological review.

[48]  R. C. Honey,et al.  The Wiley Handbook on the Cognitive Neuroscience of Learning , 2016 .

[49]  A. Dickinson,et al.  A re-examination of responding on ratio and regulated-probability interval schedules , 2018, bioRxiv.

[50]  Wouter Kool,et al.  When Does Model-Based Control Pay Off? , 2016, PLoS Comput. Biol..

[51]  R. Boakes,et al.  Motivational control after extended instrumental training , 1995 .

[52]  A. Silberberg,et al.  The copyist model of response emission , 2012, Psychonomic bulletin & review.

[53]  A. Dickinson,et al.  Pavlovian Processes in the Motivational Control of Instrumental Performance , 1987 .

[54]  B. Underwood,et al.  ASSOCIATION BY CONTIGUITY. , 1964, Journal of experimental psychology.

[55]  T. Maia Reinforcement learning, conditioning, and the brain: Successes and challenges , 2009, Cognitive, affective & behavioral neuroscience.

[56]  P. Killeen Mathematical principles of reinforcement , 1994 .

[57]  B. Balleine,et al.  Appetitive Pavlovian-instrumental Transfer: A review , 2016, Neuroscience & Biobehavioral Reviews.

[58]  On the primacy of molecular processes in determining response rates under variable-ratio and variable-interval schedules. , 2008, Journal of the experimental analysis of behavior.

[59]  R. Colwill The effect of noncontingent outcomes on extinction of the response-outcome association , 2001 .

[60]  B. Balleine,et al.  Goal-directed instrumental action: contingency and incentive learning and their cortical substrates , 1998, Neuropharmacology.

[61]  W M Baum,et al.  The correlation-based law of effect. , 1973, Journal of the experimental analysis of behavior.

[62]  P. Reed Human sensitivity to reinforcement feedback functions , 2007, Psychonomic bulletin & review.

[63]  R. Rescorla Pavlovian conditioning and its proper control procedures. , 1967, Psychological review.

[64]  Bernard W. Balleine,et al.  The Meaning of Behavior: Discriminating Reflex and Volition in the Brain , 2019, Neuron.

[65]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[66]  Vivian V. Valentin,et al.  Determining the Neural Substrates of Goal-Directed Learning in the Human Brain , 2007, The Journal of Neuroscience.

[67]  Stefano Palminteri,et al.  Confirmation bias in human reinforcement learning: Evidence from counterfactual feedback processing , 2016, PLoS Comput. Biol..

[68]  R. Rescorla,et al.  Instrumental responding remains sensitive to reinforcer devaluation after extensive training , 1985 .

[69]  Xin Jin,et al.  Different dorsal striatum circuits mediate action discrimination and action generalization , 2012, The European journal of neuroscience.

[70]  R. Colwill Negative discriminative stimuli provide information about the identity of omitted response-contingent outcomes , 1991 .

[71]  Samuel J Gershman,et al.  Do learning rates adapt to the distribution of rewards? , 2015, Psychonomic bulletin & review.

[72]  Fabian A. Soto,et al.  Error-driven learning in visual categorization and object recognition: a common-elements model. , 2010, Psychological review.

[73]  Sarah L. Knot,et al.  Shifting the Balance Between Goals and Habits: Five Failures in Experimental Habit Induction , 2018, Journal of experimental psychology. General.

[74]  R. Herrnstein,et al.  Toward a law of response strength. , 1976 .

[75]  B. Balleine Instrumental performance following a shift in primary motivation depends on incentive learning. , 1992, Journal of experimental psychology. Animal behavior processes.

[76]  F. Cushman Action, Outcome, and Value , 2013, Personality and social psychology review : an official journal of the Society for Personality and Social Psychology, Inc.

[77]  R. Rescorla,et al.  Associations between the discriminative stimulus and the reinforcer in instrumental learning. , 1988 .

[78]  Timothy E. J. Behrens,et al.  Learning the value of information in an uncertain world , 2007, Nature Neuroscience.

[79]  S. Killcross,et al.  Amphetamine Exposure Enhances Habit Formation , 2006, The Journal of Neuroscience.

[80]  R. Rescorla Transfer of instrumental control mediated by a devalued outcome , 1994 .

[81]  P. Marra,et al.  The impact of free-ranging domestic cats on wildlife of the United States , 2013, Nature Communications.

[82]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[83]  Phil Reed Schedules of reinforcement as determinants of human causality judgments and response rates. , 2001, Journal of experimental psychology. Animal behavior processes.

[84]  M. Bouton,et al.  Renewal of extinguished responding in a second context , 1994 .

[85]  Koch Sigmund Ed,et al.  Psychology: A Study of A Science , 1962 .

[86]  J. Platt,et al.  Reinforcement rate and interresponse time differentiation. , 1976, Journal of the experimental analysis of behavior.

[87]  J. Wearden,et al.  Interresponse-time reinforcement and behavior under aperiodic reinforcement schedules: A case study using computer modeling. , 1988 .

[88]  B. Balleine,et al.  Human and Rodent Homologies in Action Control: Corticostriatal Determinants of Goal-Directed and Habitual Action , 2010, Neuropsychopharmacology.

[89]  P. Dayan,et al.  Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.

[90]  C. Bradshaw,et al.  Relationship between response rate and reinforcement frequency in variable-interval schedules: III. The effect of d-amphetamine. , 1981, Journal of the experimental analysis of behavior.

[91]  Fabian A. Soto,et al.  Explaining compound generalization in associative and causal learning through rational principles of dimensional generalization. , 2014, Psychological review.

[92]  L. J. Hammond The effect of contingency upon the appetitive conditioning of free-operant behavior. , 1980, Journal of the experimental analysis of behavior.

[93]  Peter Dayan,et al.  How fast to work: Response vigor, motivation and tonic dopamine , 2005, NIPS.

[94]  R. Rescorla,et al.  A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .

[95]  W. Estes Discriminative conditioning; effects of a Pavlovian conditioned stimulus upon a subsequently established operant response. , 1948, Journal of experimental psychology.

[96]  E. Coutureau,et al.  Pavlovian to instrumental transfer: A neurobehavioural perspective , 2010, Neuroscience & Biobehavioral Reviews.

[97]  P R Killeen,et al.  Incentive theory: II. Models for choice. , 1982, Journal of the experimental analysis of behavior.

[98]  Andrew T. Marshall,et al.  Mesolimbic dopamine projections mediate cue-motivated reward seeking but not reward retrieval in rats , 2018, eLife.

[99]  Alec Solway,et al.  Goal-directed decision making as probabilistic inference: a computational framework and potential neural correlates. , 2012, Psychological review.

[100]  R. Rescorla,et al.  Postconditioning devaluation of a reinforcer affects instrumental responding. , 1985 .

[101]  Amir Dezfouli,et al.  Speed/Accuracy Trade-Off between the Habitual and the Goal-Directed Processes , 2011, PLoS Comput. Biol..

[102]  S. Killcross,et al.  Inactivation of the infralimbic prefrontal cortex reinstates goal-directed responding in overtrained rats , 2003, Behavioural Brain Research.

[103]  A. Dickinson,et al.  Performance on Ratio and Interval Schedules with Matched Reinforcement Rates , 1990, The Quarterly journal of experimental psychology. B, Comparative and physiological psychology.

[104]  Christopher D. Adams Post-Conditioning Devaluation of an Instrumental Reinforcer has no Effect on Extinction Performance , 1980 .

[105]  Justin A. Harris,et al.  Response rate and reinforcement rate in Pavlovian conditioning. , 2011, Journal of experimental psychology. Animal behavior processes.

[106]  David Abrahamson,et al.  Contemporary Animal Learning Theory , 1981 .

[107]  R. Rescorla Evidence for an association between the discriminative stimulus and the response-outcome association in instrumental learning. , 1990, Journal of experimental psychology. Animal behavior processes.

[108]  E. Murray,et al.  Differential Effects of Amygdala, Orbital Prefrontal Cortex, and Prelimbic Cortex Lesions on Goal-Directed Behavior in Rhesus Macaques , 2013, The Journal of Neuroscience.

[109]  P. Dayan,et al.  Goals and Habits in the Brain , 2013, Neuron.

[110]  W. J. Griffiths,et al.  Free-Operant Acquisition with Delayed Reinforcement , 1992 .

[111]  M. Bouton,et al.  Unexpected food outcomes can return a habit to goal-directed action , 2020, Neurobiology of Learning and Memory.

[112]  Shinsuke Shimojo,et al.  Neural Computations Underlying Arbitration between Model-Based and Model-free Learning , 2013, Neuron.

[113]  A. Dickinson,et al.  Choice and contingency in the development of behavioral autonomy during instrumental conditioning. , 2010, Journal of experimental psychology. Animal behavior processes.

[114]  R. Rescorla Response-independent outcome presentation can leave instrumental R-O associations intact , 1992 .

[115]  Kevin J. Miller,et al.  Habits without Values , 2016, bioRxiv.

[116]  M. Bouton,et al.  Some factors that restore goal-direction to a habitual behavior , 2020, Neurobiology of Learning and Memory.

[117]  P. Dayan,et al.  Tonic dopamine: opportunity costs and the control of response vigor , 2007, Psychopharmacology.

[118]  A. R. Wagner SOP: A Model of Automatic Memory Processing in Animal Behavior , 2014 .

[119]  S. Killcross,et al.  Coordination of actions and habits in the medial prefrontal cortex of rats. , 2003, Cerebral cortex.

[120]  T. Robbins,et al.  Disruption in the Balance Between Goal-Directed Behavior and Habit Learning in Obsessive-Compulsive Disorder , 2011, The American journal of psychiatry.

[121]  J. E. Mazur,et al.  Steady-state performance on fixed-, mixed-, and random-ratio schedules. , 1983, Journal of the experimental analysis of behavior.

[122]  Brian J. Wiltgen,et al.  The Effect of Ratio and Interval Training on Pavlovian-Instrumental Transfer in Mice , 2012, PloS one.

[123]  A. Dickinson,et al.  Free-Operant Avoidance Behavior by Rats after Reinforcer Revaluation Using Opioid Agonists and d-Amphetamine , 2014, The Journal of Neuroscience.

[124]  B. Balleine,et al.  Habits, action sequences and reinforcement learning , 2012, The European journal of neuroscience.

[125]  J. J. McDowell,et al.  Feedback functions, optimization, and the relation of response rate to reinforcer rate. , 2006, Journal of the experimental analysis of behavior.

[126]  A. Dickinson,et al.  The role of contextual conditioning in the effect of reinforcer devaluation on instrumental performance by rats , 2010, Behavioural Processes.

[127]  B. Haynes,et al.  of Human and Rodent , 1983 .

[128]  W M Baum,et al.  In search of the feedback function for variable-interval schedules. , 1992, Journal of the experimental analysis of behavior.

[129]  R. Malott,et al.  Principles of Behavior , 2007 .

[130]  Floyd C. Mace,et al.  Schedules of reinforcement , 2011 .

[131]  G. Urcelay,et al.  Delayed rewards facilitate habit formation. , 2019, Journal of experimental psychology. Animal learning and cognition.

[132]  R. R. Bush,et al.  A Mathematical Model for Simple Learning , 1951 .

[133]  B. Balleine,et al.  The Role of Learning in the Operation of Motivational Systems , 2002 .