Hierarchical control of goal-directed action in the cortical–basal ganglia network

Goal-directed control depends on constructing a model of the world that maps actions onto specific outcomes, allowing choice to remain adaptive when the values of outcomes change. In complex environments, however, such models can become computationally unwieldy. One solution to this problem is to develop a hierarchical control structure within which more complex, or abstract, actions are built from simpler ones. Here we review findings suggesting that the acquisition, evaluation and execution of goal-directed actions accords well with predictions from hierarchical models. We describe recent evidence that hierarchical action control is implemented in a series of feedback loops integrating secondary motor areas with the basal ganglia and describe how such a structure not only overcomes issues of dimensionality, but also helps to explain the formation of actions sequences, action chunking and the relationship between goal-directed actions and habits.

[1]  Xin Jin,et al.  Basal Ganglia Subcircuits Distinctively Encode the Parsing and Concatenation of Action Sequences , 2014, Nature Neuroscience.

[2]  Henrik I. Christensen,et al.  Evolutionary Development of Hierarchical Learning Structures , 2007, IEEE Transactions on Evolutionary Computation.

[3]  C. Marsden,et al.  Self-initiated versus externally triggered movements. I. An investigation using measurement of regional cerebral blood flow with PET and movement-related potentials in normal and Parkinson's disease subjects. , 1995, Brain : a journal of neurology.

[4]  B. Balleine,et al.  Lesions of dorsolateral striatum preserve outcome expectancy but disrupt habit formation in instrumental learning , 2004, The European journal of neuroscience.

[5]  K. Doya Complementary roles of basal ganglia and cerebellum in learning and motor control , 2000, Current Opinion in Neurobiology.

[6]  R. Pew Acquisition of hierarchical control over the temporal organization of a skill. , 1966, Journal of experimental psychology.

[7]  Bernard W. Balleine,et al.  Actions, Action Sequences and Habits: Evidence That Goal-Directed and Habitual Action Control Are Hierarchically Organized , 2013, PLoS Comput. Biol..

[8]  B. Balleine,et al.  The Role of the Dorsal Striatum in Reward and Decision-Making , 2007, The Journal of Neuroscience.

[9]  B. Balleine,et al.  Human and Rodent Homologies in Action Control: Corticostriatal Determinants of Goal-Directed and Habitual Action , 2010, Neuropsychopharmacology.

[10]  G. Logan,et al.  On the ability to inhibit thought and action: general and special theories of an act of control. , 2014, Psychological review.

[11]  Kae Nakamura,et al.  Neuronal activity in medial frontal cortex during learning of sequential procedures. , 1998, Journal of neurophysiology.

[12]  Amir Dezfouli,et al.  Habits as action sequences: hierarchical action control and changes in outcome value , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[13]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[14]  B. Balleine,et al.  A specific role for posterior dorsolateral striatum in human habit learning , 2009, The European journal of neuroscience.

[15]  KouichiC . Nakamura,et al.  Dichotomous Organization of the External Globus Pallidus , 2012, Neuron.

[16]  Rui Costa,et al.  Premotor cortex is critical for goal-directed actions , 2013, Front. Comput. Neurosci..

[17]  Balaraman Ravindran,et al.  Improved Switching among Temporally Abstract Actions , 1998, NIPS.

[18]  S. Haber,et al.  The Reward Circuit: Linking Primate Anatomy and Human Imaging , 2010, Neuropsychopharmacology.

[19]  Parashkev Nachev,et al.  Volition and Conflict in Human Medial Frontal Cortex , 2005, Current Biology.

[20]  Balaraman Ravindran,et al.  Improved Switching among Temporally Abstract Actions". In Advances in Neural Information Processing Systems , 1999 .

[21]  B. Balleine,et al.  Habits, action sequences and reinforcement learning , 2012, The European journal of neuroscience.

[22]  B. Balleine,et al.  Evidence of Action Sequence Chunking in Goal-Directed Instrumental Conditioning and Its Dependence on the Dorsomedial Prefrontal Cortex , 2009, The Journal of Neuroscience.

[23]  T. Robbins,et al.  Putting a spin on the dorsal–ventral divide of the striatum , 2004, Trends in Neurosciences.

[24]  B. Balleine,et al.  Thalamocortical integration of instrumental learning and performance and their disintegration in addiction , 2015, Brain Research.

[25]  S. Vijayakumar,et al.  Competitive-Cooperative-Concurrent Reinforcement Learning with Importance Sampling , 2004 .

[26]  M. Botvinick,et al.  Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective , 2009, Cognition.

[27]  Xin Jin,et al.  Start/stop signals emerge in nigrostriatal circuits during sequence learning , 2010, Nature.

[28]  Daniel Bullock,et al.  Learning and production of movement sequences: behavioral, neurophysiological, and modeling perspectives. , 2004, Human movement science.

[29]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[30]  A. Nambu,et al.  Functional significance of the cortico–subthalamo–pallidal ‘hyperdirect’ pathway , 2002, Neuroscience Research.

[31]  Steven W. Keele,et al.  Movement control in skilled motor performance. , 1968 .

[32]  D. Wolpert,et al.  Internal models in the cerebellum , 1998, Trends in Cognitive Sciences.

[33]  Nikolaus R. McFarland,et al.  Striatonigrostriatal Pathways in Primates Form an Ascending Spiral from the Shell to the Dorsolateral Striatum , 2000, The Journal of Neuroscience.

[34]  Mitsuo Kawato,et al.  Heterarchical reinforcement-learning model for integration of multiple cortico-striatal loops: fMRI examination in stimulus-action-reward association learning , 2006, Neural Networks.

[35]  Steven A. Jax,et al.  The problem of serial order in behavior: Lashley's legacy. , 2007, Human movement science.

[36]  Henry H. Yin,et al.  The Role of the Murine Motor Cortex in Action Duration and Order , 2009, Front. Integr. Neurosci..

[37]  J. Tanji Sequential organization of multiple movements: involvement of cortical motor areas. , 2001, Annual review of neuroscience.

[38]  A. Graybiel,et al.  Role of [corrected] nigrostriatal dopamine system in learning to perform sequential motor tasks in a predictive manner. , 1999, Journal of neurophysiology.

[39]  K. Doya,et al.  Multiple representations and algorithms for reinforcement learning in the cortico-basal ganglia circuit , 2011, Current Opinion in Neurobiology.

[40]  Carlos Diuk,et al.  Hierarchical Learning Induces Two Simultaneous, But Separable, Prediction Errors in Human Basal Ganglia , 2013, The Journal of Neuroscience.

[41]  G. E. Alexander,et al.  Functional architecture of basal ganglia circuits: neural substrates of parallel processing , 1990, Trends in Neurosciences.

[42]  A. Graybiel The Basal Ganglia and Chunking of Action Repertoires , 1998, Neurobiology of Learning and Memory.

[43]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[44]  Laura A. Bradfield,et al.  Hierarchical and binary associations compete for behavioral control during instrumental biconditional discrimination. , 2013, Journal of experimental psychology. Animal behavior processes.

[45]  K. Lashley The problem of serial order in behavior , 1951 .

[46]  R. T. Watson,et al.  Efferent Connections of the Rostral Portion of Medial Agranular Cortex in Rats , 1987, Brain Research Bulletin.

[47]  H. Yin The Sensorimotor Striatum Is Necessary for Serial Order Learning , 2010, The Journal of Neuroscience.

[48]  K. Doya,et al.  Parallel Cortico-Basal Ganglia Mechanisms for Acquisition and Execution of Visuomotor SequencesA Computational Approach , 2001, Journal of Cognitive Neuroscience.

[49]  K. Doya,et al.  Multiple Representations of Belief States and Action Values in Corticobasal Ganglia Loops , 2007, Annals of the New York Academy of Sciences.

[50]  K. Doya,et al.  Distinct Neural Representation in the Dorsolateral, Dorsomedial, and Ventral Parts of the Striatum during Fixed- and Free-Choice Tasks , 2015, The Journal of Neuroscience.

[51]  Alec Solway,et al.  Optimal Behavioral Hierarchy , 2014, PLoS Comput. Biol..

[52]  R. Reep,et al.  Neuronal connections of orbital cortex in rats: topography of cortical and thalamic afferents , 1996, Experimental Brain Research.

[53]  Jun Morimoto,et al.  Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning , 2000, Robotics Auton. Syst..

[54]  B. Balleine,et al.  Lesions of Medial Prefrontal Cortex Disrupt the Acquisition But Not the Expression of Goal-Directed Learning , 2005, The Journal of Neuroscience.

[55]  C. Kennard,et al.  Functional role of the supplementary and pre-supplementary motor areas , 2008, Nature Reviews Neuroscience.

[56]  Kenji Doya,et al.  Reinforcement learning: Computational theory and biological mechanisms , 2007, HFSP journal.

[57]  Joseph T. McGuire,et al.  A Neural Signature of Hierarchical Reinforcement Learning , 2011, Neuron.

[58]  B. Balleine,et al.  Action-value comparisons in the dorsolateral prefrontal cortex control choice between goal-directed actions , 2014, Nature Communications.

[59]  P. Strick,et al.  Basal ganglia and cerebellar loops: motor and cognitive circuits , 2000, Brain Research Reviews.

[60]  R. Poldrack,et al.  Cortical and Subcortical Contributions to Stop Signal Response Inhibition: Role of the Subthalamic Nucleus , 2006, The Journal of Neuroscience.

[61]  K. Doya,et al.  Parallel neural networks for learning sequential procedures , 1999, Trends in Neurosciences.

[62]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[63]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[64]  KouichiC . Nakamura,et al.  Prototypic and Arkypallidal Neurons in the Dopamine-Intact External Globus Pallidus , 2015, The Journal of Neuroscience.

[65]  Scott T. Grafton,et al.  Differential Recruitment of the Sensorimotor Putamen and Frontoparietal Cortex during Motor Chunking in Humans , 2012, Neuron.

[66]  Hanspeter A. Mallot,et al.  'Fine-to-Coarse' Route Planning and Navigation in Regionalized Environments , 2003, Spatial Cogn. Comput..

[67]  O. Hikosaka,et al.  Two types of dopamine neuron distinctly convey positive and negative motivational signals , 2009, Nature.