Heterarchical reinforcement-learning model for integration of multiple cortico-striatal loops: fMRI examination in stimulus-action-reward association learning

The brain's most difficult computation in decision-making learning is searching for essential information related to rewards among vast multimodal inputs and then integrating it into beneficial behaviors. Contextual cues consisting of limbic, cognitive, visual, auditory, somatosensory, and motor signals need to be associated with both rewards and actions by utilizing an internal representation such as reward prediction and reward prediction error. Previous studies have suggested that a suitable brain structure for such integration is the neural circuitry associated with multiple cortico-striatal loops. However, computational exploration still remains into how the information in and around these multiple closed loops can be shared and transferred. Here, we propose a "heterarchical reinforcement learning" model, where reward prediction made by more limbic and cognitive loops is propagated to motor loops by spiral projections between the striatum and substantia nigra, assisted by cortical projections to the pedunculopontine tegmental nucleus, which sends excitatory input to the substantia nigra. The model makes several fMRI-testable predictions of brain activity during stimulus-action-reward association learning. The caudate nucleus and the cognitive cortical areas are correlated with reward prediction error, while the putamen and motor-related areas are correlated with stimulus-action-dependent reward prediction. Furthermore, a heterogeneous activity pattern within the striatum is predicted depending on learning difficulty, i.e., the anterior medial caudate nucleus will be correlated more with reward prediction error when learning becomes difficult, while the posterior putamen will be correlated more with stimulus-action-dependent reward prediction in easy learning. Our fMRI results revealed that different cortico-striatal loops are operating, as suggested by the proposed model.

[1]  W. Schultz,et al.  Neuronal activity in monkey ventral striatum related to the expectation of reward , 1992, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[2]  O. Hikosaka,et al.  Differential activation of monkey striatal neurons in the early and late stages of procedural learning , 2002, Experimental Brain Research.

[3]  Peter C. Young,et al.  Recursive Estimation and Time Series Analysis , 1984 .

[4]  K. Doya,et al.  A Neural Correlate of Reward-Based Behavioral Learning in Caudate Nucleus: A Functional Magnetic Resonance Imaging Study of a Stochastic Decision Task , 2004, The Journal of Neuroscience.

[5]  Nikolaus R. McFarland,et al.  Striatonigrostriatal Pathways in Primates Form an Ascending Spiral from the Shell to the Dorsolateral Striatum , 2000, The Journal of Neuroscience.

[6]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[7]  Joshua W. Brown,et al.  How the Basal Ganglia Use Parallel Excitatory and Inhibitory Learning Pathways to Selectively Respond to Unexpected Rewarding Cues , 1999, The Journal of Neuroscience.

[8]  Karl J. Friston,et al.  Statistical parametric maps in functional imaging: A general linear approach , 1994 .

[9]  Peter C. Young,et al.  Recursive Estimation and Time-Series Analysis: An Introduction , 1984 .

[10]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[11]  H. Spekreijse,et al.  Two distinct modes of sensory processing observed in monkey primary visual cortex (V1) , 2001, Nature Neuroscience.

[12]  Mitsuo Kawato,et al.  MOSAIC Model for Sensorimotor Learning and Control , 2001, Neural Computation.

[13]  M. Inase,et al.  Corticostriatal projections from the somatic motor areas of the frontal cortex in the macaque monkey: segregation versus overlap of input zones from the primary motor cortex, the supplementary motor area, and the premotor cortex , 1998, Experimental Brain Research.

[14]  Yasushi Kobayashi,et al.  Reward predicting activity of pedunculopontine tegmental nucleus neurons during visually guided saccade tasks , 2005 .

[15]  D. Barraclough,et al.  Prefrontal cortex and decision making in a mixed-strategy game , 2004, Nature Neuroscience.

[16]  P. Strick,et al.  Basal ganglia and cerebellar loops: motor and cognitive circuits , 2000, Brain Research Reviews.

[17]  J. Hollerman,et al.  Dopamine neurons report an error in the temporal prediction of reward during learning , 1998, Nature Neuroscience.

[18]  A. Graybiel,et al.  Distributed but convergent ordering of corticostriatal projections: analysis of the frontal eye field and the supplementary eye field in the macaque monkey , 1992, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[19]  S. Haber The primate basal ganglia: parallel and integrative networks , 2003, Journal of Chemical Neuroanatomy.

[20]  Balaraman Ravindran,et al.  Improved Switching among Temporally Abstract Actions , 1998, NIPS.

[21]  P. Strick,et al.  Imaging the premotor areas , 2001, Current Opinion in Neurobiology.

[22]  Jun Morimoto,et al.  Hierarchical reinforcement learning for motion learning: learning 'stand-up' trajectories , 1998, Adv. Robotics.

[23]  M. Kawato,et al.  Different neural correlates of reward expectation and reward expectation error in the putamen and caudate nucleus during stimulus-action-reward association learning. , 2006, Journal of neurophysiology.

[24]  S. Carmichael,et al.  Networks related to the orbital and medial prefrontal cortex; a substrate for emotional behavior? , 1996, Progress in brain research.

[25]  Richard S. Sutton,et al.  Dimensions of Reinforcement Learning , 1998 .

[26]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[27]  P. Strick,et al.  Motor areas of the medial wall: a review of their location and functional activation. , 1996, Cerebral cortex.

[28]  Joel L. Davis,et al.  Adaptive Critics and the Basal Ganglia , 1995 .

[30]  O. Hikosaka,et al.  Differential roles of monkey striatum in learning of sequential hand movement , 1997, Experimental Brain Research.

[31]  O. Hikosaka,et al.  A possible role of midbrain dopamine neurons in short- and long-term adaptation of saccades to position-reward mapping. , 2004, Journal of neurophysiology.

[32]  K. Doya,et al.  Parallel neural networks for learning sequential procedures , 1999, Trends in Neurosciences.

[33]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[34]  Yasushi Kobayashi,et al.  Contribution of pedunculopontine tegmental nucleus neurons to performance of visually guided saccade tasks in monkeys. , 2002, Journal of neurophysiology.

[35]  Satinder P. Singh,et al.  Reinforcement Learning with a Hierarchy of Abstract Models , 1992, AAAI.

[36]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[37]  Balaraman Ravindran,et al.  Improved Switching among Temporally Abstract Actions". In Advances in Neural Information Processing Systems , 1999 .

[38]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[39]  S. Lehéricy,et al.  Foot, hand, face and eye representation in the human striatum. , 2003, Cerebral cortex.

[40]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[41]  B. K. Hartman,et al.  Distribution of pontomesencephalic cholinergic neurons projecting to substantia nigra differs significantly from those projecting to ventral tegmental area , 1995, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[42]  A. Dickinson,et al.  Neuronal coding of prediction errors. , 2000, Annual review of neuroscience.

[43]  P. Goldman-Rakic,et al.  Longitudinal topography and interdigitation of corticostriatal projections in the rhesus monkey , 1985, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[44]  Kae Nakamura,et al.  Central mechanisms of motor skill learning , 2002, Current Opinion in Neurobiology.

[45]  G. E. Alexander,et al.  Basal ganglia-thalamocortical circuits: parallel substrates for motor, oculomotor, "prefrontal" and "limbic" functions. , 1990, Progress in brain research.

[46]  Karl J. Friston,et al.  Dissociable Roles of Ventral and Dorsal Striatum in Instrumental Conditioning , 2004, Science.