A theory of actions and habits: The interaction of rate correlation and contiguity systems in free-operant behavior.

Contemporary theories of instrumental performance assume that responding can be controlled by 2 behavioral systems, 1 goal-directed that encodes the outcome of an action, and 1 habitual that reinforces the response strength of the same action. Here we present a model of free-operant behavior in which goal-directed control is determined by the correlation between the rates of the action and the outcome whereas the total prediction error generated by contiguous reinforcement by the outcome controls habitual response strength. The outputs of these two systems summate to generate a total response strength. This cooperative model addresses the difference in the behavioral impact of ratio and interval schedules, the transition from goal-directed to habitual control with extended training, the persistence of goal-directed control under choice procedures and following extinction, among other phenomena. In these respects, this dual-system model is unique in its account of free-operant behavior. (PsycInfo Database Record (c) 2020 APA, all rights reserved).

[1]  C. L. Hull Principles of behavior : an introduction to behavior theory , 1943 .

[2]  W. Estes Discriminative conditioning; effects of a Pavlovian conditioned stimulus upon a subsequently established operant response. , 1948, Journal of experimental psychology.

[3]  R. R. Bush,et al.  A Mathematical Model for Simple Learning , 1951 .

[4]  B. Underwood,et al.  ASSOCIATION BY CONTIGUITY. , 1964, Journal of experimental psychology.

[5]  R. Rescorla Pavlovian conditioning and its proper control procedures. , 1967, Psychological review.

[6]  J. Konorski Integrative activity of the brain : an interdisciplinary approach , 1967 .

[7]  R. Weisman,et al.  Positive conditioned reinforcement of Sidman avoidance behavior in rats. , 1969 .

[8]  R. Herrnstein Method and theory in the study of avoidance. , 1969, Psychological review.

[9]  R. Rescorla,et al.  A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .

[10]  J. C. Johnston,et al.  A cognitive theory of avoidance learning. , 1973 .

[11]  W M Baum,et al.  The correlation-based law of effect. , 1973, Journal of the experimental analysis of behavior.

[12]  G. Morrison,et al.  Taste-mediated conditioned aversion to an exteroceptive stimulus following LiCl poisoning. , 1974, Journal of comparative and physiological psychology.

[13]  N. Mackintosh The psychology of animal learning , 1974 .

[14]  E. Holman Some conditions for the dissociation of consummatory and instrumental behavior in rats , 1975 .

[15]  J. Platt,et al.  Reinforcement rate and interresponse time differentiation. , 1976, Journal of the experimental analysis of behavior.

[16]  R. Herrnstein,et al.  Toward a law of response strength. , 1976 .

[17]  T. J. Matthews,et al.  Yoked variable-ratio and variable-interval responding in pigeons. , 1977, Journal of the experimental analysis of behavior.

[18]  S. Hanson,et al.  Arousal: its genesis and manifestation as response rate. , 1978, Psychological review.

[19]  Gene M. Heyman,et al.  Matching and Maximizing in Concurrent Schedules , 1979 .

[20]  L. J. Hammond The effect of contingency upon the appetitive conditioning of free-operant behavior. , 1980, Journal of the experimental analysis of behavior.

[21]  Christopher D. Adams Post-Conditioning Devaluation of an Instrumental Reinforcer has no Effect on Extinction Performance , 1980 .

[22]  Christopher D. Adams,et al.  Instrumental Responding following Reinforcer Devaluation , 1981 .

[23]  A. Dickinson Contemporary Animal Learning Theory , 1981 .

[24]  C. Bradshaw,et al.  Relationship between response rate and reinforcement frequency in variable-interval schedules: III. The effect of d-amphetamine. , 1981, Journal of the experimental analysis of behavior.

[25]  Drazen Prelec,et al.  Matching, maximizing, and the hyperbolic reinforcement feedback function. , 1982 .

[26]  Christopher D. Adams Variations in the Sensitivity of Instrumental Responding to Reinforcer Devaluation , 1982 .

[27]  P R Killeen,et al.  Incentive theory: II. Models for choice. , 1982, Journal of the experimental analysis of behavior.

[28]  Christopher D. Adams,et al.  The Effect of the Instrumental Training Contingency on Susceptibility to Reinforcer Devaluation , 1983 .

[29]  J. E. Mazur,et al.  Steady-state performance on fixed-, mixed-, and random-ratio schedules. , 1983, Journal of the experimental analysis of behavior.

[30]  A. Silberberg,et al.  Primacy of interresponse-time reinforcement in accounting for rate differences under variable-ratio and variable-interval schedules. , 1984 .

[31]  A. Dickinson Actions and habits: the development of behavioural autonomy , 1985 .

[32]  R. Rescorla,et al.  Instrumental responding remains sensitive to reinforcer devaluation after extensive training , 1985 .

[33]  A. Dickinson,et al.  Pavlovian Processes in the Motivational Control of Instrumental Performance , 1987 .

[34]  A. Dickinson,et al.  Motivational Control of Instrumental Performance: The Role of Prior Experience of The Reinforcer , 1988 .

[35]  R. Rescorla,et al.  Associations between the discriminative stimulus and the reinforcer in instrumental learning. , 1988 .

[36]  J. Wearden,et al.  Interresponse-time reinforcement and behavior under aperiodic reinforcement schedules: A case study using computer modeling. , 1988 .

[37]  A. Dickinson,et al.  Incentive Learning and the Motivational Control of Instrumental Performance , 1989 .

[38]  A. Dickinson,et al.  The Intentionality of Animal Action , 1990 .

[39]  R. Rescorla,et al.  Evidence for the hierarchical structure of instrumental learning , 1990 .

[40]  A. Dickinson,et al.  Performance on Ratio and Interval Schedules with Matched Reinforcement Rates , 1990, The Quarterly journal of experimental psychology. B, Comparative and physiological psychology.

[41]  R. Rescorla Evidence for an association between the discriminative stimulus and the response-outcome association in instrumental learning. , 1990, Journal of experimental psychology. Animal behavior processes.

[42]  A. Dickinson,et al.  Instrumental judgment and performance under variations in action-outcome contingency and contiguity , 1991, Memory & cognition.

[43]  R. Colwill Negative discriminative stimuli provide information about the identity of omitted response-contingent outcomes , 1991 .

[44]  W. J. Griffiths,et al.  Free-Operant Acquisition with Delayed Reinforcement , 1992 .

[45]  B. Balleine Instrumental performance following a shift in primary motivation depends on incentive learning. , 1992, Journal of experimental psychology. Animal behavior processes.

[46]  R. Rescorla Response-independent outcome presentation can leave instrumental R-O associations intact , 1992 .

[47]  W M Baum,et al.  In search of the feedback function for variable-interval schedules. , 1992, Journal of the experimental analysis of behavior.

[48]  R. Rescorla Preservation of response-outcome associations through extinction , 1993 .

[49]  B. Balleine,et al.  Motivational control of goal-directed action , 1994 .

[50]  P. Killeen Mathematical principles of reinforcement , 1994 .

[51]  R. Rescorla Transfer of instrumental control mediated by a devalued outcome , 1994 .

[52]  M. Bouton,et al.  Renewal of extinguished responding in a second context , 1994 .

[53]  R. Boakes,et al.  Motivational control after extended instrumental training , 1995 .

[54]  B. Balleine,et al.  Bidirectional Instrumental Conditioning , 1996, The Quarterly journal of experimental psychology. B, Comparative and physiological psychology.

[55]  A. Dickinson,et al.  Omission Learning after Instrumental Pretraining , 1998 .

[56]  B. Balleine,et al.  Goal-directed instrumental action: contingency and incentive learning and their cortical substrates , 1998, Neuropharmacology.

[57]  Phil Reed Schedules of reinforcement as determinants of human causality judgments and response rates. , 2001, Journal of experimental psychology. Animal behavior processes.

[58]  S. Killcross,et al.  Inactivation of the infralimbic prefrontal cortex reinstates goal-directed responding in overtrained rats , 2003, Behavioural Brain Research.

[59]  S. Killcross,et al.  Coordination of actions and habits in the medial prefrontal cortex of rats. , 2003, Cerebral cortex.

[60]  P. Holland Relations between Pavlovian-instrumental transfer and reinforcer devaluation. , 2004, Journal of experimental psychology. Animal behavior processes.

[61]  Edgar H Vogel,et al.  Quantitative models of Pavlovian conditioning , 2004, Brain Research Bulletin.

[62]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[63]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[64]  Peter Dayan,et al.  How fast to work: Response vigor, motivation and tonic dopamine , 2005, NIPS.

[65]  S. Killcross,et al.  Amphetamine Exposure Enhances Habit Formation , 2006, The Journal of Neuroscience.

[66]  P. Dayan,et al.  Tonic dopamine: opportunity costs and the control of response vigor , 2007, Psychopharmacology.

[67]  J. J. McDowell,et al.  Feedback functions, optimization, and the relation of response rate to reinforcer rate. , 2006, Journal of the experimental analysis of behavior.

[68]  P. Reed Human sensitivity to reinforcement feedback functions , 2007, Psychonomic bulletin & review.

[69]  Vivian V. Valentin,et al.  Determining the Neural Substrates of Goal-Directed Learning in the Human Brain , 2007, The Journal of Neuroscience.

[70]  Timothy E. J. Behrens,et al.  Learning the value of information in an uncertain world , 2007, Nature Neuroscience.

[71]  On the primacy of molecular processes in determining response rates under variable-ratio and variable-interval schedules. , 2008, Journal of the experimental analysis of behavior.

[72]  B. Balleine,et al.  A specific role for posterior dorsolateral striatum in human habit learning , 2009, The European journal of neuroscience.

[73]  T. Maia Reinforcement learning, conditioning, and the brain: Successes and challenges , 2009, Cognitive, affective & behavioral neuroscience.

[74]  E. Thorndike Animal Intelligence; Experimental Studies , 2009 .

[75]  R. Costa,et al.  Frontiers in Integrative Neuroscience Integrative Neuroscience , 2022 .

[76]  Fabian A. Soto,et al.  Error-driven learning in visual categorization and object recognition: a common-elements model. , 2010, Psychological review.

[77]  B. Balleine,et al.  Human and Rodent Homologies in Action Control: Corticostriatal Determinants of Goal-Directed and Habitual Action , 2010, Neuropsychopharmacology.

[78]  E. Coutureau,et al.  Pavlovian to instrumental transfer: A neurobehavioural perspective , 2010, Neuroscience & Biobehavioral Reviews.

[79]  A. Dickinson,et al.  Choice and contingency in the development of behavioral autonomy during instrumental conditioning. , 2010, Journal of experimental psychology. Animal behavior processes.

[80]  A. Dickinson,et al.  The role of contextual conditioning in the effect of reinforcer devaluation on instrumental performance by rats , 2010, Behavioural Processes.

[81]  P. Dayan,et al.  Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.

[82]  Amir Dezfouli,et al.  Speed/Accuracy Trade-Off between the Habitual and the Goal-Directed Processes , 2011, PLoS Comput. Biol..

[83]  Justin A. Harris,et al.  Response rate and reinforcement rate in Pavlovian conditioning. , 2011, Journal of experimental psychology. Animal behavior processes.

[84]  T. Robbins,et al.  Disruption in the Balance Between Goal-Directed Behavior and Habit Learning in Obsessive-Compulsive Disorder , 2011, The American journal of psychiatry.

[85]  N. Daw,et al.  The ubiquity of model-based reinforcement learning , 2012, Current Opinion in Neurobiology.

[86]  A. Dickinson Associative learning and animal cognition , 2012, Philosophical Transactions of the Royal Society B: Biological Sciences.

[87]  A. Silberberg,et al.  The copyist model of response emission , 2012, Psychonomic bulletin & review.

[88]  Xin Jin,et al.  Different dorsal striatum circuits mediate action discrimination and action generalization , 2012, The European journal of neuroscience.

[89]  Alec Solway,et al.  Goal-directed decision making as probabilistic inference: a computational framework and potential neural correlates. , 2012, Psychological review.

[90]  Brian J. Wiltgen,et al.  The Effect of Ratio and Interval Training on Pavlovian-Instrumental Transfer in Mice , 2012, PloS one.

[91]  B. Balleine,et al.  Habits, action sequences and reinforcement learning , 2012, The European journal of neuroscience.

[92]  Bernard W. Balleine,et al.  Actions, Action Sequences and Habits: Evidence That Goal-Directed and Habitual Action Control Are Hierarchically Organized , 2013, PLoS Comput. Biol..

[93]  R. Costa,et al.  Orbitofrontal and striatal circuits dynamically encode the shift between goal-directed and habitual actions , 2013, Nature Communications.

[94]  F. Cushman Action, Outcome, and Value , 2013, Personality and social psychology review : an official journal of the Society for Personality and Social Psychology, Inc.

[95]  E. Murray,et al.  Differential Effects of Amygdala, Orbital Prefrontal Cortex, and Prelimbic Cortex Lesions on Goal-Directed Behavior in Rhesus Macaques , 2013, The Journal of Neuroscience.

[96]  P. Dayan,et al.  Goals and Habits in the Brain , 2013, Neuron.

[97]  B. Balleine,et al.  Effects of Repeated Cocaine Exposure on Habit Learning and Reversal by N-Acetylcysteine , 2014, Neuropsychopharmacology.

[98]  A. Dickinson,et al.  Safety signals as instrumental reinforcers during free-operant avoidance , 2014, Learning & memory.

[99]  Fabian A. Soto,et al.  Explaining compound generalization in associative and causal learning through rational principles of dimensional generalization. , 2014, Psychological review.

[100]  Shinsuke Shimojo,et al.  Neural Computations Underlying Arbitration between Model-Based and Model-free Learning , 2013, Neuron.

[101]  A. R. Wagner SOP: A Model of Automatic Memory Processing in Animal Behavior , 2014 .

[102]  A. Dickinson,et al.  Free-Operant Avoidance Behavior by Rats after Reinforcer Revaluation Using Opioid Agonists and d-Amphetamine , 2014, The Journal of Neuroscience.

[103]  N. Daw,et al.  Model-based learning protects against forming habits , 2015, Cognitive, Affective, & Behavioral Neuroscience.

[104]  M. Bouton,et al.  Contextual control of instrumental actions and habits. , 2015, Journal of experimental psychology. Animal learning and cognition.

[105]  Samuel J Gershman,et al.  Do learning rates adapt to the distribution of rewards? , 2015, Psychonomic bulletin & review.

[106]  T. Robbins,et al.  An Associative Account of Avoidance , 2016 .

[107]  Wouter Kool,et al.  When Does Model-Based Control Pay Off? , 2016, PLoS Comput. Biol..

[108]  B. Balleine,et al.  Appetitive Pavlovian-instrumental Transfer: A review , 2016, Neuroscience & Biobehavioral Reviews.

[109]  Wouter Kool,et al.  Cost-Benefit Arbitration Between Multiple Reinforcement-Learning Systems , 2017, Psychological science.

[110]  Stefano Palminteri,et al.  Confirmation bias in human reinforcement learning: Evidence from counterfactual feedback processing , 2016, PLoS Comput. Biol..

[111]  A. Dickinson,et al.  Actions and Habits: Psychological Issues in Dual-System Theory , 2018 .

[112]  A. Dickinson,et al.  A re-examination of responding on ratio and regulated-probability interval schedules , 2018, bioRxiv.

[113]  Sarah L. Knot,et al.  Shifting the Balance Between Goals and Habits: Five Failures in Experimental Habit Induction , 2018, Journal of experimental psychology. General.

[114]  Goal-directed control on interval schedules does not depend on the action-outcome correlation. , 2019, Journal of experimental psychology. Animal learning and cognition.

[115]  Fabian A. Soto,et al.  Human instrumental performance in ratio and interval contingencies: A challenge for associative theory , 2019, Quarterly journal of experimental psychology.

[116]  B. Balleine,et al.  Hierarchical Action Control: Adaptive Collaboration Between Actions and Habits , 2019, Front. Psychol..

[117]  Bernard W. Balleine,et al.  The Meaning of Behavior: Discriminating Reflex and Volition in the Brain , 2019, Neuron.

[118]  Andrew T. Marshall,et al.  Mesolimbic dopamine projections mediate cue-motivated reward seeking but not reward retrieval in rats , 2018, eLife.

[119]  Kevin J. Miller,et al.  Habits without Values , 2016, bioRxiv.

[120]  G. Urcelay,et al.  Delayed rewards facilitate habit formation. , 2019, Journal of experimental psychology. Animal learning and cognition.

[121]  Fabian A. Soto,et al.  Evidence for a dissociation between causal beliefs and instrumental actions , 2018, Quarterly journal of experimental psychology.

[122]  M. Bouton,et al.  Unexpected food outcomes can return a habit to goal-directed action , 2020, Neurobiology of Learning and Memory.

[123]  M. Bouton,et al.  Some factors that restore goal-direction to a habitual behavior , 2020, Neurobiology of Learning and Memory.