Habits as action sequences: hierarchical action control and changes in outcome value

Goal-directed action involves making high-level choices that are implemented using previously acquired action sequences to attain desired goals. Such a hierarchical schema is necessary for goal-directed actions to be scalable to real-life situations, but results in decision-making that is less flexible than when action sequences are unfolded and the decision-maker deliberates step-by-step over the outcome of each individual action. In particular, from this perspective, the offline revaluation of any outcomes that fall within action sequence boundaries will be invisible to the high-level planner resulting in decisions that are insensitive to such changes. Here, within the context of a two-stage decision-making task, we demonstrate that this property can explain the emergence of habits. Next, we show how this hierarchical account explains the insensitivity of over-trained actions to changes in outcome value. Finally, we provide new data that show that, under extended extinction conditions, habitual behaviour can revert to goal-directed control, presumably as a consequence of decomposing action sequences into single actions. This hierarchical view suggests that the development of action sequences and the insensitivity of actions to changes in outcome value are essentially two sides of the same coin, explaining why these two aspects of automatic behaviour involve a shared neural structure.

[1]  A. Graybiel,et al.  Activity of striatal neurons reflects dynamic encoding and recoding of procedural memories , 2005, Nature.

[2]  J. Tsien,et al.  NMDA Receptors in Dopaminergic Neurons Are Crucial for Habit Learning , 2011, Neuron.

[3]  P. Dayan,et al.  Tonic dopamine: opportunity costs and the control of response vigor , 2007, Psychopharmacology.

[4]  H. Simon,et al.  Expert chess memory: revisiting the chunking hypothesis. , 1998, Memory.

[5]  A. Graybiel,et al.  Differential Dynamics of Activity Changes in Dorsolateral and Dorsomedial Striatal Loops during Learning , 2010, Neuron.

[6]  M. Nissen,et al.  Attentional requirements of learning: Evidence from performance measures , 1987, Cognitive Psychology.

[7]  B. Balleine,et al.  Human and Rodent Homologies in Action Control: Corticostriatal Determinants of Goal-Directed and Habitual Action , 2010, Neuropsychopharmacology.

[8]  A. Yerkes Orientation in the White Rat. , 2022 .

[9]  K. Lashley The problem of serial order in behavior , 1951 .

[10]  Elger L. Abrahamse,et al.  Control of automated behavior: insights from the discrete sequence production task , 2013, Front. Hum. Neurosci..

[11]  C. Marsden,et al.  Disturbance of sequential movements in patients with Parkinson's disease. , 1987, Brain : a journal of neurology.

[12]  A. Faure,et al.  Lesion to the Nigrostriatal Dopamine System Disrupts Stimulus-Response Habit Formation , 2005, The Journal of Neuroscience.

[13]  Amir Dezfouli,et al.  Speed/Accuracy Trade-Off between the Habitual and the Goal-Directed Processes , 2011, PLoS Comput. Biol..

[14]  A. Dickinson Instrumental Conditioning , 2020, Encyclopedia of Evolutionary Psychological Science.

[15]  R. Pew Acquisition of hierarchical control over the temporal organization of a skill. , 1966, Journal of experimental psychology.

[16]  Bernard W. Balleine,et al.  Actions, Action Sequences and Habits: Evidence That Goal-Directed and Habitual Action Control Are Hierarchically Organized , 2013, PLoS Comput. Biol..

[17]  Christopher D. Adams Variations in the Sensitivity of Instrumental Responding to Reinforcer Devaluation , 1982 .

[18]  M. Botvinick,et al.  Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective , 2009, Cognition.

[19]  Daniel Bullock,et al.  Learning and production of movement sequences: behavioral, neurophysiological, and modeling perspectives. , 2004, Human movement science.

[20]  Michel Desmurget,et al.  Motor Sequences and the Basal Ganglia: Kinematics, Not Habits , 2010, The Journal of Neuroscience.

[21]  Allen Newell,et al.  GPS, a program that simulates human thought , 1995 .

[22]  B. Balleine,et al.  Inactivation of dorsolateral striatum enhances sensitivity to changes in the action–outcome contingency in instrumental conditioning , 2006, Behavioural Brain Research.

[23]  Benjamin O. Turner,et al.  Cortical and basal ganglia contributions to habit learning and automaticity , 2010, Trends in Cognitive Sciences.

[24]  S. Grossberg,et al.  Laminar cortical dynamics of cognitive and motor working memory, sequence learning and performance: toward a unified theory of how the cerebral cortex works. , 2008, Psychological review.

[25]  A. Dickinson,et al.  Omission Learning after Instrumental Pretraining , 1998 .

[26]  A. Dickinson,et al.  Choice and contingency in the development of behavioral autonomy during instrumental conditioning. , 2010, Journal of experimental psychology. Animal behavior processes.

[27]  A. Graybiel,et al.  Role of [corrected] nigrostriatal dopamine system in learning to perform sequential motor tasks in a predictive manner. , 1999, Journal of neurophysiology.

[28]  O. Hikosaka,et al.  Differential roles of monkey striatum in learning of sequential hand movement , 1997, Experimental Brain Research.

[29]  O. Hikosaka,et al.  Differential activation of monkey striatal neurons in the early and late stages of procedural learning , 2002, Experimental Brain Research.

[30]  Steven W. Keele,et al.  Movement control in skilled motor performance. , 1968 .

[31]  G. Miller,et al.  Plans and the structure of behavior , 1960 .

[32]  Scott T. Grafton,et al.  Differential Recruitment of the Sensorimotor Putamen and Frontoparietal Cortex during Motor Chunking in Humans , 2012, Neuron.

[33]  Maxime Levesque,et al.  Motor sequence learning in primate: Role of the D2 receptor in movement chunking during consolidation , 2009, Behavioural Brain Research.

[34]  K A Ericcson,et al.  Acquisition of a memory skill. , 1980, Science.

[35]  Nathaniel D. Daw,et al.  Environmental statistics and the trade-off between model-based and TD learning in humans , 2011, NIPS.

[36]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[37]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[38]  W. F. Book,et al.  The Psychology of Skill , 1947 .

[39]  B. Balleine Sensation, Incentive Learning, and the Motivational Control of Goal-Directed Action , 2011 .

[40]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[41]  R. Rescorla,et al.  The role of response-reinforcer associations increases throughout extended instrumental training , 1988 .

[42]  G. Miller,et al.  Plans and the structure of behavior , 1960 .

[43]  Shinsuke Shimojo,et al.  Neural Computations Underlying Arbitration between Model-Based and Model-free Learning , 2013, Neuron.

[44]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[45]  E. Tolman Cognitive maps in rats and men. , 1948, Psychological review.

[46]  B. Balleine,et al.  Habits, action sequences and reinforcement learning , 2012, The European journal of neuroscience.

[47]  P. Dayan,et al.  Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.

[48]  Gottfried Ja,et al.  Sensation, Incentive Learning, and the Motivational Control of Goal-Directed Action -- Neurobiology of Sensation and Reward , 2011 .

[49]  M. Lévesque,et al.  Raclopride-induced motor consolidation impairment in primates: role of the dopamine type-2 receptor in movement chunking into integrated sequences , 2007, Experimental Brain Research.

[50]  B. Balleine,et al.  Lesions of dorsolateral striatum preserve outcome expectancy but disrupt habit formation in instrumental learning , 2004, The European journal of neuroscience.

[51]  D. Norman Categorization of action slips. , 1981 .

[52]  Maxime J Parent,et al.  Movement chunking during sequence learning is a dopamine-dependant process: a study conducted in Parkinson’s disease , 2010, Experimental Brain Research.

[53]  Xin Jin,et al.  Start/stop signals emerge in nigrostriatal circuits during sequence learning , 2010, Nature.

[54]  R. Rescorla,et al.  Postconditioning devaluation of a reinforcer affects instrumental responding. , 1985 .

[55]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[56]  Xin Jin,et al.  Basal Ganglia Subcircuits Distinctively Encode the Parsing and Concatenation of Action Sequences , 2014, Nature Neuroscience.

[57]  W. James,et al.  The Principles of Psychology. , 1983 .