Habits without Values

Habits form a crucial component of behavior. In recent years, key computational models have conceptualized habits as arising from model-free reinforcement learning (RL) mechanisms, which typically select between available actions based on the future value expected to result from each. Traditionally, however, habits have been understood as behaviors that can be triggered directly by a stimulus, without requiring the animal to evaluate expected outcomes. Here, we develop a computational model instantiating this traditional view, in which habits develop through the direct strengthening of recently taken actions rather than through the encoding of outcomes. We demonstrate that this model accounts for key behavioral manifestations of habits, including insensitivity to outcome devaluation and contingency degradation, as well as the effects of reinforcement schedule on the rate of habit formation. The model also explains the prevalent observation of perseveration in repeated-choice tasks as an additional behavioral manifestation of the habit system. We suggest that mapping habitual behaviors onto value-free mechanisms provides a parsimonious account of existing behavioral and neural data. This mapping may provide a new foundation for building robust and comprehensive models of the interaction of habits with other, more goal-directed types of behaviors and help to better guide research into the neural mechanisms underlying control of instrumental behavior more generally.

[1]  John M. Pearson,et al.  Neuronal basis of sequential foraging decisions in a patchy environment , 2011, Nature Neuroscience.

[2]  James L. McClelland,et al.  Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. , 1995, Psychological review.

[3]  Jadin C. Jackson,et al.  Reconciling reinforcement learning models with behavioral extinction and renewal: implications for addiction, relapse, and problem gambling. , 2007, Psychological review.

[4]  Giorgio Coricelli,et al.  Response to Comment on "The Involvement of the Orbitofrontal Cortex in the Experience of Regret" , 2005, Science.

[5]  G. Schoenbaum,et al.  Does the orbitofrontal cortex signal value? , 2011, Annals of the New York Academy of Sciences.

[6]  A. Markman,et al.  The Curse of Planning: Dissecting Multiple Reinforcement-Learning Systems by Taxing the Central Executive , 2013 .

[7]  Y. Niv,et al.  Ventral Striatum and Orbitofrontal Cortex Are Both Required for Model-Based, But Not Model-Free, Reinforcement Learning , 2011, The Journal of Neuroscience.

[8]  Sara E. Morrison,et al.  Neurons in the Nucleus Accumbens Promote Selection Bias for Nearer Objects , 2014, The Journal of Neuroscience.

[9]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[10]  Karl J. Friston,et al.  Active Inference, homeostatic regulation and adaptive behavioural control , 2015, Progress in Neurobiology.

[11]  B. Balleine,et al.  Lesions of mediodorsal thalamus and anterior thalamic nuclei produce dissociable effects on instrumental conditioning in rats , 2003, The European journal of neuroscience.

[12]  P. Glimcher,et al.  Testing the Reward Prediction Error Hypothesis with an Axiomatic Model , 2010, The Journal of Neuroscience.

[13]  R. Costa,et al.  Frontiers in Integrative Neuroscience Integrative Neuroscience , 2022 .

[14]  S. Kakade,et al.  Learning and selective attention , 2000, Nature Neuroscience.

[15]  B. Balleine,et al.  The role of prelimbic cortex in instrumental conditioning , 2003, Behavioural Brain Research.

[16]  B. Balleine,et al.  Goal-directed instrumental action: contingency and incentive learning and their cortical substrates , 1998, Neuropharmacology.

[17]  Jung Hoon Sul,et al.  Role of Striatum in Updating Values of Chosen Actions , 2009, The Journal of Neuroscience.

[18]  Kevin J. Miller,et al.  Dorsal hippocampus contributes to model-based planning , 2017, Nature Neuroscience.

[19]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[20]  Tommy C. Blanchard,et al.  Reward Value Comparison via Mutual Inhibition in Ventromedial Prefrontal Cortex , 2014, Neuron.

[21]  Amir Dezfouli,et al.  Speed/Accuracy Trade-Off between the Habitual and the Goal-Directed Processes , 2011, PLoS Comput. Biol..

[22]  R. Costa,et al.  Orbitofrontal and striatal circuits dynamically encode the shift between goal-directed and habitual actions , 2013, Nature Communications.

[23]  D. Spalding The Principles of Psychology , 1873, Nature.

[24]  W. Brown Animal Intelligence: Experimental Studies , 1912, Nature.

[25]  Shinsuke Shimojo,et al.  Neural Computations Underlying Arbitration between Model-Based and Model-free Learning , 2013, Neuron.

[26]  Christopher D. Adams Variations in the Sensitivity of Instrumental Responding to Reinforcer Devaluation , 1982 .

[27]  Thomas H. B. FitzGerald,et al.  Disruption of Dorsolateral Prefrontal Cortex Decreases Model-Based in Favor of Model-free Control in Humans , 2013, Neuron.

[28]  John M. Ennis,et al.  A neurobiological theory of automaticity in perceptual categorization. , 2007, Psychological review.

[29]  H. Seo,et al.  The prefrontal cortex and hybrid learning during iterative competitive games , 2011, Annals of the New York Academy of Sciences.

[30]  W. James,et al.  The Principles of Psychology. , 1983 .

[31]  N. Daw,et al.  Multiplicity of control in the basal ganglia: computational roles of striatal subregions , 2011, Current Opinion in Neurobiology.

[32]  B. Balleine,et al.  Inactivation of dorsolateral striatum enhances sensitivity to changes in the action–outcome contingency in instrumental conditioning , 2006, Behavioural Brain Research.

[33]  B. Balleine,et al.  Lesions of dorsolateral striatum preserve outcome expectancy but disrupt habit formation in instrumental learning , 2004, The European journal of neuroscience.

[34]  H. Yin,et al.  The role of the basal ganglia in habit formation , 2006, Nature Reviews Neuroscience.

[35]  N. Daw,et al.  Generalization of value in reinforcement learning by humans , 2012, The European journal of neuroscience.

[36]  P. Dayan,et al.  Mapping value based planning and extensively trained choice in the human brain , 2012, Nature Neuroscience.

[37]  L. J. Hammond The effect of contingency upon the appetitive conditioning of free-operant behavior. , 1980, Journal of the experimental analysis of behavior.

[38]  B. Balleine,et al.  Calculating Consequences: Brain Systems That Encode the Causal Effects of Actions , 2008, The Journal of Neuroscience.

[39]  Jonathan D. Cohen,et al.  Mechanisms underlying dependencies of performance on stimulus history in a two-alternative forced-choice task , 2002, Cognitive, affective & behavioral neuroscience.

[40]  A. Dickinson Actions and habits: the development of behavioural autonomy , 1985 .

[41]  Z. Kurth-Nelson,et al.  A theoretical account of cognitive effects in delay discounting , 2012, The European journal of neuroscience.

[42]  Christopher D. Adams,et al.  The Effect of the Instrumental Training Contingency on Susceptibility to Reinforcer Devaluation , 1983 .

[43]  Matthijs A. A. van der Meer,et al.  Frontiers in Integrative Neuroscience Integrative Neuroscience Covert Expectation-of-reward in Rat Ventral Striatum at Decision Points , 2022 .

[44]  Matthijs A. A. van der Meer,et al.  Internally generated sequences in learning and executing goal-directed behavior , 2014, Trends in Cognitive Sciences.

[45]  Nathaniel D. Daw,et al.  Cortical and Hippocampal Correlates of Deliberation during Model-Based Decisions for Rewards in Humans , 2013, PLoS Comput. Biol..

[46]  D P Munoz,et al.  Time course of a repetition effect on saccadic reaction time in non-human primates. , 2002, Archives italiennes de biologie.

[47]  P. Bertelson,et al.  Serial Choice Reaction-time as a Function of Response versus Signal-and-Response Repetition , 1965, Nature.

[48]  Máté Lengyel,et al.  Goal-Directed Decision Making with Spiking Neurons , 2016, The Journal of Neuroscience.

[49]  Richard S. Sutton,et al.  Associative Learning from Replayed Experience , 2017, bioRxiv.

[50]  D. Barraclough,et al.  Learning and decision making in monkeys during a rock-paper-scissors game. , 2005, Brain research. Cognitive brain research.

[51]  Alec Solway,et al.  Goal-directed decision making as probabilistic inference: a computational framework and potential neural correlates. , 2012, Psychological review.

[52]  Nathaniel D. Daw,et al.  Cognitive Control Predicts Use of Model-based Reinforcement Learning , 2014, Journal of Cognitive Neuroscience.

[53]  A. Rangel Regulation of dietary choice by the decision-making circuitry , 2013, Nature Neuroscience.

[54]  A. David Redish,et al.  Hippocampal replay contributes to within session learning in a temporal difference reinforcement learning model , 2005, Neural Networks.

[55]  N. Daw,et al.  Dopamine selectively remediates 'model-based' reward learning: a computational approach. , 2016, Brain : a journal of neurology.

[56]  P. Dayan,et al.  Goals and Habits in the Brain , 2013, Neuron.

[57]  B. Balleine,et al.  The Effect of Lesions of the Basolateral Amygdala on Instrumental Conditioning , 2003, The Journal of Neuroscience.

[58]  P. Dayan,et al.  Single-Trial Inhibition of Anterior Cingulate Disrupts Model-based Reinforcement Learning in a Two-step Decision Task. , 2017 .

[59]  Michael J. Frank,et al.  Making Working Memory Work: A Computational Model of Learning in the Prefrontal Cortex and Basal Ganglia , 2006, Neural Computation.

[60]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[61]  Benjamin O. Turner,et al.  Cortical and basal ganglia contributions to habit learning and automaticity , 2010, Trends in Cognitive Sciences.

[62]  David T. Neal,et al.  A new look at habits and the habit-goal interface. , 2007, Psychological review.

[63]  N. Daw,et al.  The ubiquity of model-based reinforcement learning , 2012, Current Opinion in Neurobiology.

[64]  B. Balleine,et al.  Habits, action sequences and reinforcement learning , 2012, The European journal of neuroscience.

[65]  Michael J. Frank,et al.  Linking Across Levels of Computation in Model-Based Cognitive Neuroscience , 2015 .

[66]  J. Buckholtz Social norms, self-control, and the value of antisocial behavior , 2015, Current Opinion in Behavioral Sciences.

[67]  N. Daw,et al.  Characterizing a psychiatric symptom dimension related to deficits in goal-directed control , 2016, eLife.

[68]  K. Sakai,et al.  Autonomous Mechanism of Internal Choice Estimate Underlies Decision Inertia , 2014, Neuron.

[69]  P. Dayan,et al.  Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.

[70]  N. Daw,et al.  Model-based learning protects against forming habits , 2015, Cognitive, Affective, & Behavioral Neuroscience.

[71]  K. Doya,et al.  Validation of Decision-Making Models and Analysis of Decision Variables in the Rat Basal Ganglia , 2009, The Journal of Neuroscience.

[72]  E. Thorndike Animal Intelligence; Experimental Studies , 2009 .

[73]  Bradley C. Love,et al.  Coherency-maximizing exploration in the supermarket , 2017, Nature Human Behaviour.

[74]  Giovanni Pezzulo,et al.  Re-aligning models of habitual and goal-directed decision-making , 2018 .

[75]  B. Verplanken,et al.  Predicting behavior from actions in the past : repeated decision making or a matter of habit? , 1998 .

[76]  H. Seo,et al.  Neural basis of reinforcement learning and decision making. , 2012, Annual review of neuroscience.

[77]  B. Balleine,et al.  A specific role for posterior dorsolateral striatum in human habit learning , 2009, The European journal of neuroscience.

[78]  A. Graybiel Habits, rituals, and the evaluative brain. , 2008, Annual review of neuroscience.

[79]  B. Balleine,et al.  Human and Rodent Homologies in Action Control: Corticostriatal Determinants of Goal-Directed and Habitual Action , 2010, Neuropsychopharmacology.

[80]  P. Dayan,et al.  Adaptive integration of habits into depth-limited planning defines a habitual-goal–directed spectrum , 2016, Proceedings of the National Academy of Sciences.

[81]  T. Robbins,et al.  Goal-directed learning and obsessive–compulsive disorder , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[82]  Christopher D. Adams,et al.  Instrumental Responding following Reinforcer Devaluation , 1981 .

[83]  Kevin McCabe,et al.  Neural signature of fictive learning signals in a sequential investment task , 2007, Proceedings of the National Academy of Sciences.

[84]  Wendy Wood,et al.  Psychology of Habit. , 2016, Annual review of psychology.

[85]  J. O'Doherty,et al.  Regret and its avoidance: a neuroimaging study of choice behavior , 2005, Nature Neuroscience.

[86]  Justin S. Feinstein,et al.  Selective impairment of goal-directed decision-making following lesions to the human ventromedial prefrontal cortex , 2017, Brain : a journal of neurology.

[87]  Jamil Zaki,et al.  Social Norms Shift Behavioral and Neural Responses to Foods , 2015, Journal of Cognitive Neuroscience.

[88]  Timothy E. J. Behrens,et al.  Review Frontal Cortex and Reward-guided Learning and Decision-making Figure 1. Frontal Brain Regions in the Macaque Involved in Reward-guided Learning and Decision-making Finer Grained Anatomical Divisions with Frontal Cortical Systems for Reward-guided Behavior , 2022 .

[89]  Gregory Ashby,et al.  A neuropsychological theory of multiple systems in category learning. , 1998, Psychological review.

[90]  Jane Wardle,et al.  Making health habitual: the psychology of 'habit-formation' and general practice. , 2012, The British journal of general practice : the journal of the Royal College of General Practitioners.

[91]  B. Balleine,et al.  The role of the dorsomedial striatum in instrumental conditioning , 2005, The European journal of neuroscience.

[92]  David W Tank,et al.  Sources of noise during accumulation of evidence in unrestrained and voluntarily head-restrained rats , 2015, eLife.

[93]  S. Killcross,et al.  Coordination of actions and habits in the medial prefrontal cortex of rats. , 2003, Cerebral cortex.

[94]  H. Eichenbaum,et al.  The hippocampus and memory for orderly stimulus relations. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[95]  F. Cushman Action, Outcome, and Value , 2013, Personality and social psychology review : an official journal of the Society for Personality and Social Psychology, Inc.

[96]  G. Schoenbaum,et al.  Transition from ‘model-based’ to ‘model-free’ behavioral control in addiction: Involvement of the orbitofrontal cortex and dorsolateral striatum , 2014, Neuropharmacology.

[97]  Vivian V. Valentin,et al.  Determining the Neural Substrates of Goal-Directed Learning in the Human Brain , 2007, The Journal of Neuroscience.

[98]  A. Dickinson,et al.  Omission Learning after Instrumental Pretraining , 1998 .

[99]  P. Glimcher,et al.  JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR 2005, 84, 555–579 NUMBER 3(NOVEMBER) DYNAMIC RESPONSE-BY-RESPONSE MODELS OF MATCHING BEHAVIOR IN RHESUS MONKEYS , 2022 .

[100]  E. Miller,et al.  An integrative theory of prefrontal cortex function. , 2001, Annual review of neuroscience.

[101]  Sébastien Hélie,et al.  A Neurocomputational Model of Automatic Sequence Production , 2015, Journal of Cognitive Neuroscience.

[102]  Joshua L. Jones,et al.  Orbitofrontal Cortex Supports Behavior and Learning Using Inferred But Not Cached Values , 2012, Science.

[103]  Verena Dorner,et al.  Decision Inertia and Arousal: Using NeuroIS to Analyze Bio-Physiological Correlates of Decision Inertia in a Dual-Choice Paradigm , 2018 .

[104]  M. Frank,et al.  How cognitive theory guides neuroscience , 2015, Cognition.

[105]  Simon Hong,et al.  A pallidus-habenula-dopamine pathway signals inferred stimulus values. , 2010, Journal of neurophysiology.

[106]  J. G. Taylor,et al.  Vicarious trial and error. , 1951, Psychological review.

[107]  H. Eichenbaum,et al.  Conservation of hippocampal memory function in rats and humans , 1996, Nature.

[108]  Richard S. Sutton,et al.  Sample-based learning and search with permanent and transient memories , 2008, ICML '08.

[109]  C. L. Hull Principles of behavior : an introduction to behavior theory , 1943 .

[110]  J. O'Doherty,et al.  Contributions of the striatum to learning, motivation, and performance: an associative account , 2012, Trends in Cognitive Sciences.

[111]  P. Lally,et al.  How are habits formed: Modelling habit formation in the real world , 2010 .

[112]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[113]  Nicolas P. Rougier,et al.  Dual Competition between the Basal Ganglia and the Cortex: from Action-Outcome to Stimulus-Response , 2017, bioRxiv.

[114]  A. Markman,et al.  Journal of Experimental Psychology : General Retrospective Revaluation in Sequential Decision Making : A Tale of Two Systems , 2012 .

[115]  Sang Wan Lee,et al.  The structure of reinforcement-learning mechanisms in the human brain , 2015, Current Opinion in Behavioral Sciences.

[116]  P. Phillips,et al.  Subsecond dopamine fluctuations in human striatum encode superposed error signals about actual and counterfactual reward , 2015, Proceedings of the National Academy of Sciences.

[117]  Shawn W. Ell,et al.  Learning robust cortico-cortical associations with the basal ganglia: An integrative review , 2015, Cortex.

[118]  S. Killcross,et al.  Inactivation of the infralimbic prefrontal cortex reinstates goal-directed responding in overtrained rats , 2003, Behavioural Brain Research.

[119]  D. Shohamy Learning and motivation in the human striatum , 2011, Current Opinion in Neurobiology.

[120]  Geoffrey Schoenbaum,et al.  Midbrain dopamine neurons compute inferred and cached value prediction errors in a common framework , 2016, eLife.

[121]  M. Gluck,et al.  Dopaminergic Drugs Modulate Learning Rates and Perseveration in Parkinson's Patients in a Dynamic Foraging Task , 2009, The Journal of Neuroscience.

[122]  Stefan Everling,et al.  Attentional Selection Can Be Predicted by Reinforcement Learning of Task-relevant Stimulus Features Weighted by Value-independent Stickiness , 2016, Journal of Cognitive Neuroscience.

[123]  M. Crockett Models of morality , 2013, Trends in Cognitive Sciences.

[124]  C. Law,et al.  The relative influences of priors and sensory evidence on an oculomotor decision variable during perceptual learning. , 2008, Journal of neurophysiology.

[125]  J. O'Doherty,et al.  The problem with value , 2014, Neuroscience & Biobehavioral Reviews.