Re-aligning models of habitual and goal-directed decision-making

Abstract The classic dichotomy between habitual and goal-directed behavior is often mapped onto a dichotomy between model-free and model-based reinforcement learning (RL) algorithms, putatively implemented in segregated neuronal circuits. Despite significant heuristic value in motivating experimental investigations, several lines of evidence suggest that this mapping is in need of modification and/or realignment. First, whereas habitual and goal-directed behaviors have been shown to depend on cleanly separable neural circuitry, recent data suggest that model-based and model-free representations in the brain are largely overlapping. Second, habitual behaviors need not involve representations of expected reinforcement (i.e., need not involve RL, model-free, or otherwise) but may be based instead on simple stimulus–response associations. Finally, goal-directed decisions may not reflect a single model-based algorithm but rather a continuum of “model-basedness.” These lines of evidence thus suggest a possible reconceptualization of the distinction between model-free versus model-based RL—one in which both contribute to a single goal-directed system that is value-based, as opposed to distinct, habitual mechanisms that are value-free. In this chapter, we discuss new models that have extended the RL approach to modeling habitual and goal-directed behavior and assess how these have clarified our understanding of the underlying neural circuitry.

[1]  Wendy Wood,et al.  Psychology of Habit. , 2016, Annual review of psychology.

[2]  S. Glover Planning and control in action , 2004, Behavioral and Brain Sciences.

[3]  R. Dolan,et al.  Ventral striatal dopamine reflects behavioral and neural signatures of model-based control during sequential decision making , 2015, Proceedings of the National Academy of Sciences.

[4]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[5]  Gregory Ashby,et al.  A neuropsychological theory of multiple systems in category learning. , 1998, Psychological review.

[6]  H. Yin,et al.  The role of the basal ganglia in habit formation , 2006, Nature Reviews Neuroscience.

[7]  N. Daw,et al.  Generalization of value in reinforcement learning by humans , 2012, The European journal of neuroscience.

[8]  M. Botvinick,et al.  Planning as inference , 2012, Trends in Cognitive Sciences.

[9]  Alice Y. Chiang,et al.  Working-memory capacity protects model-based learning from stress , 2013, Proceedings of the National Academy of Sciences.

[10]  Karl J. Friston,et al.  Computational psychiatry , 2012, Trends in Cognitive Sciences.

[11]  E. Tolman Cognitive maps in rats and men. , 1948, Psychological review.

[12]  P. Dayan,et al.  Mapping value based planning and extensively trained choice in the human brain , 2012, Nature Neuroscience.

[13]  Karl J. Friston,et al.  Active Inference, homeostatic regulation and adaptive behavioural control , 2015, Progress in Neurobiology.

[14]  Richard S. Sutton,et al.  Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[15]  Donald A. Norman,et al.  Attention to Action , 1986 .

[16]  N. Daw,et al.  The ubiquity of model-based reinforcement learning , 2012, Current Opinion in Neurobiology.

[17]  B. Balleine,et al.  Habits, action sequences and reinforcement learning , 2012, The European journal of neuroscience.

[18]  T. Robbins,et al.  Reliance on habits at the expense of goal-directed control following dopamine precursor depletion , 2011, Psychopharmacology.

[19]  Walter Schneider,et al.  Controlled and automatic human information processing: II. Perceptual learning, automatic attending and a general theory. , 1977 .

[20]  Makoto Ito,et al.  Model-based action planning involves cortico-cerebellar and basal ganglia networks , 2016, Scientific Reports.

[21]  N. Daw,et al.  Model-based learning protects against forming habits , 2015, Cognitive, Affective, & Behavioral Neuroscience.

[22]  N. Daw,et al.  Characterizing a psychiatric symptom dimension related to deficits in goal-directed control , 2016, eLife.

[23]  A G Barto,et al.  Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.

[24]  P. Dayan,et al.  Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.

[25]  Joel Veness,et al.  Monte-Carlo Planning in Large POMDPs , 2010, NIPS.

[26]  The formation of habits in the neocortex under the implicit supervision of the basal ganglia , 2015, BMC Neuroscience.

[27]  A. Markman,et al.  The Curse of Planning: Dissecting Multiple Reinforcement-Learning Systems by Taxing the Central Executive , 2013 .

[28]  Hannah M. Batchelor,et al.  Dopamine Neurons Respond to Errors in the Prediction of Sensory Features of Expected Rewards , 2017, Neuron.

[29]  Wouter Kool,et al.  Cost-Benefit Arbitration Between Multiple Reinforcement-Learning Systems , 2017, Psychological science.

[30]  A. Markman,et al.  Journal of Experimental Psychology : General Retrospective Revaluation in Sequential Decision Making : A Tale of Two Systems , 2012 .

[31]  Joel L. Davis,et al.  A Model of How the Basal Ganglia Generate and Use Neural Signals That Predict Reinforcement , 1994 .

[32]  P. Phillips,et al.  Subsecond dopamine fluctuations in human striatum encode superposed error signals about actual and counterfactual reward , 2015, Proceedings of the National Academy of Sciences.

[33]  Giovanni Pezzulo,et al.  Divide et impera: subgoaling reduces the complexity of probabilistic inference and problem solving , 2015, Journal of The Royal Society Interface.

[34]  Roshan Cools,et al.  Habitual versus Goal-directed Action Control in Parkinson Disease , 2011, Journal of Cognitive Neuroscience.

[35]  Joshua W. Brown,et al.  Medial prefrontal cortex as an action-outcome predictor , 2011, Nature Neuroscience.

[36]  W. Seeley Attention and Cognitive Control in Affective Perception for Embodied Appraisals , 2013 .

[37]  G. Schoenbaum,et al.  Transition from ‘model-based’ to ‘model-free’ behavioral control in addiction: Involvement of the orbitofrontal cortex and dorsolateral striatum , 2014, Neuropharmacology.

[38]  Kyle S. Smith,et al.  A dual operator view of habitual behavior reflecting cortical and striatal dynamics. , 2013, Neuron.

[39]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[40]  Ali Ghazizadeh,et al.  Parallel basal ganglia circuits for decision making , 2018, Journal of Neural Transmission.

[41]  D. Hassabis,et al.  Neural Mechanisms of Hierarchical Planning in a Virtual Subway Network , 2016, Neuron.

[42]  Jonathan Evans Dual-processing accounts of reasoning, judgment, and social cognition. , 2008, Annual review of psychology.

[43]  Richard S. Sutton,et al.  Sample-based learning and search with permanent and transient memories , 2008, ICML '08.

[44]  Z. Kurth-Nelson,et al.  A theoretical account of cognitive effects in delay discounting , 2012, The European journal of neuroscience.

[45]  L. J. Hammond The effect of contingency upon the appetitive conditioning of free-operant behavior. , 1980, Journal of the experimental analysis of behavior.

[46]  Christopher D. Adams,et al.  The Effect of the Instrumental Training Contingency on Susceptibility to Reinforcer Devaluation , 1983 .

[47]  Kevin J. Miller,et al.  Habits without Values , 2016, bioRxiv.

[48]  P. Janak,et al.  Defining the place of habit in substance use disorders , 2017, Progress in Neuro-Psychopharmacology and Biological Psychiatry.

[49]  Peter Dayan,et al.  Bonsai Trees in Your Head: How the Pavlovian System Sculpts Goal-Directed Choices by Pruning Decision Trees , 2012, PLoS Comput. Biol..

[50]  C. L. Hull Principles of behavior : an introduction to behavior theory , 1943 .

[51]  Karl J. Friston,et al.  Neuroscience and Biobehavioral Reviews , 2022 .

[52]  M. Botvinick,et al.  The successor representation in human reinforcement learning , 2016, bioRxiv.

[53]  Wendy Wood,et al.  Habit and intention in everyday life: The multiple processes by which past behavior predicts future behavior. , 1998 .

[54]  Karl J. Friston,et al.  Hierarchical Active Inference: A Theory of Motivated Control , 2018, Trends in Cognitive Sciences.

[55]  J. Buckholtz Social norms, self-control, and the value of antisocial behavior , 2015, Current Opinion in Behavioral Sciences.

[56]  Hilbert J. Kappen,et al.  Risk Sensitive Path Integral Control , 2010, UAI.

[57]  G. Oettingen Future thought and behaviour change , 2012 .

[58]  Peter Dayan,et al.  Interplay of approximate planning strategies , 2015, Proceedings of the National Academy of Sciences.

[59]  W. T. Maddox,et al.  Annals of the New York Academy of Sciences Human Category Learning 2.0 Brief Review of First-generation Research , 2022 .

[60]  Andrea Brovelli,et al.  Advanced Parkinson's disease effect on goal-directed and habitual processes involved in visuomotor associative learning , 2013, Front. Hum. Neurosci..

[61]  Peter Dayan,et al.  Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.

[62]  M. Botvinick,et al.  Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective , 2009, Cognition.

[63]  Simon Hong,et al.  A pallidus-habenula-dopamine pathway signals inferred stimulus values. , 2010, Journal of neurophysiology.

[64]  Michael L. Littman,et al.  Reinforcement learning improves behaviour from evaluative feedback , 2015, Nature.

[65]  N. Daw,et al.  Multiple Systems for Value Learning , 2014 .

[66]  Seth A. Herd,et al.  Goal-Driven Cognition in the Brain: A Computational Framework , 2014, 1404.7591.

[67]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[68]  Giovanni Pezzulo,et al.  The Mixed Instrumental Controller: Using Value of Information to Combine Habitual Choice and Mental Simulation , 2013, Front. Psychol..

[69]  A. Graybiel Habits, rituals, and the evaluative brain. , 2008, Annual review of neuroscience.

[70]  B. Balleine,et al.  Human and Rodent Homologies in Action Control: Corticostriatal Determinants of Goal-Directed and Habitual Action , 2010, Neuropsychopharmacology.

[71]  P. Dayan,et al.  Adaptive integration of habits into depth-limited planning defines a habitual-goal–directed spectrum , 2016, Proceedings of the National Academy of Sciences.

[72]  Amir Dezfouli,et al.  Habits as action sequences: hierarchical action control and changes in outcome value , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[73]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[74]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[75]  Samuel J. Gershman,et al.  Predictive representations can link model-based reinforcement learning to model-free mechanisms , 2017 .

[76]  S. Killcross,et al.  Coordination of actions and habits in the medial prefrontal cortex of rats. , 2003, Cerebral cortex.

[77]  F. Cushman Action, Outcome, and Value , 2013, Personality and social psychology review : an official journal of the Society for Personality and Social Psychology, Inc.

[78]  Shinsuke Shimojo,et al.  Neural Computations Underlying Arbitration between Model-Based and Model-free Learning , 2013, Neuron.

[79]  Eric B. Baum,et al.  What is thought? , 2003 .

[80]  Wouter Kool,et al.  When Does Model-Based Control Pay Off? , 2016, PLoS Comput. Biol..

[81]  A. Rangel Regulation of dietary choice by the decision-making circuitry , 2013, Nature Neuroscience.

[82]  P. Gollwitzer,et al.  Planning and the Control of Action , 2017 .

[83]  F. Cushman,et al.  Habitual control of goal selection in humans , 2015, Proceedings of the National Academy of Sciences.

[84]  David T. Neal,et al.  A new look at habits and the habit-goal interface. , 2007, Psychological review.

[85]  Elke U. Weber,et al.  Correcting expected utility for comparisons between alternative outcomes: A unified parameterization of regret and disappointment , 2008 .

[86]  Daniel B. Willingham,et al.  A Neuropsychological Theory of Motor Skill Learning , 2004 .

[87]  K. Newell Motor skill acquisition. , 1991, Annual review of psychology.

[88]  R. Dolan,et al.  Dopamine Enhances Model-Based over Model-Free Choice Behavior , 2012, Neuron.

[89]  P. Dayan,et al.  The algorithmic anatomy of model-based evaluation , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[90]  Geoffrey Schoenbaum,et al.  Midbrain dopamine neurons compute inferred and cached value prediction errors in a common framework , 2016, eLife.

[91]  M. Crockett Models of morality , 2013, Trends in Cognitive Sciences.

[92]  John M. Ennis,et al.  A neurobiological theory of automaticity in perceptual categorization. , 2007, Psychological review.

[93]  M. Botvinick Hierarchical reinforcement learning and decision making , 2012, Current Opinion in Neurobiology.

[94]  N. Daw,et al.  Dopamine selectively remediates 'model-based' reward learning: a computational approach. , 2016, Brain : a journal of neurology.

[95]  P. Dayan,et al.  Goals and Habits in the Brain , 2013, Neuron.

[96]  Sébastien Hélie,et al.  A Neurocomputational Model of Automatic Sequence Production , 2015, Journal of Cognitive Neuroscience.

[97]  H. Simon,et al.  Models of Bounded Rationality: Economic Analysis and Public Policy , 1984 .

[98]  Amir Dezfouli,et al.  Speed/Accuracy Trade-Off between the Habitual and the Goal-Directed Processes , 2011, PLoS Comput. Biol..

[99]  R. Costa,et al.  Orbitofrontal and striatal circuits dynamically encode the shift between goal-directed and habitual actions , 2013, Nature Communications.

[100]  D. Spalding The Principles of Psychology , 1873, Nature.

[101]  Yishay Mansour,et al.  A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.

[102]  C. Shea,et al.  Motor skill learning and performance: a review of influential factors , 2010, Medical education.

[103]  Richard S. Sutton,et al.  Predictive Representations of State , 2001, NIPS.

[104]  E. Thorndike Animal Intelligence; Experimental Studies , 2009 .

[105]  H. Aarts,et al.  Habits as knowledge structures: automaticity in goal-directed behavior. , 2000, Journal of personality and social psychology.

[106]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[107]  Christopher D. Adams Variations in the Sensitivity of Instrumental Responding to Reinforcer Devaluation , 1982 .

[108]  Karl J. Friston,et al.  Active Inference: A Process Theory , 2017, Neural Computation.

[109]  Jonathan D. Cohen,et al.  Toward a Rational and Mechanistic Account of Mental Effort. , 2017, Annual review of neuroscience.

[110]  A. Dickinson Actions and habits: the development of behavioural autonomy , 1985 .

[111]  J. O'Doherty,et al.  Regret and its avoidance: a neuroimaging study of choice behavior , 2005, Nature Neuroscience.

[112]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[113]  N. Daw,et al.  Variability in Dopamine Genes Dissociates Model-Based and Model-Free Reinforcement Learning , 2016, The Journal of Neuroscience.