A Computational Model of Integration between Reinforcement Learning and Task Monitoring in the Prefrontal Cortex

Taking inspiration from neural principles of decision-making is of particular interest to help improve adaptivity of artificial systems. Research at the crossroads of neuroscience and artificial intelligence in the last decade has helped understanding how the brain organizes reinforcement learning (RL) processes (the adaptation of decisions based on feedback from the environment). The current challenge is now to understand how the brain flexibly regulates parameters of RL such as the exploration rate based on the task structure, which is called meta-learning ([1]: Doya, 2002). Here, we propose a computational mechanism of exploration regulation based on real neurophysiological and behavioral data recorded in monkey prefrontal cortex during a visuo-motor task involving a clear distinction between exploratory and exploitative actions. We first fit trial-by-trial choices made by the monkeys with an analytical reinforcement learning model. We find that the model which has the highest likelihood of predicting monkeys' choices reveals different exploration rates at different task phases. In addition, the optimized model has a very high learning rate, and a reset of action values associated to a cue used in the task to signal condition changes. Beyond classical RL mechanisms, these results suggest that the monkey brain extracted task regularities to tune learning parameters in a task-appropriate way. We finally use these principles to develop a neural network model extending a previous cortico-striatal loop model. In our prefrontal cortex component, prediction error signals are extracted to produce feedback categorization signals. The latter are used to boost exploration after errors, and to attenuate it during exploitation, ensuring a lock on the currently rewarded choice. This model performs the task like monkeys, and provides a set of experimental predictions to be tested by future neurophysiological recordings.

[1]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[2]  E. Procyk,et al.  Anterior cingulate activity during routine and non-routine sequential behaviors in macaques , 2000, Nature Neuroscience.

[3]  Mark S. Gilzenrat,et al.  A Systems-Level Perspective on Attention and Cognitive Control: Guided Activation, Adaptive Gating, Conflict Monitoring, and Exploitation versus Exploration. , 2004 .

[4]  Peter Redgrave,et al.  A computational model of action selection in the basal ganglia. I. A new functional anatomy , 2001, Biological Cybernetics.

[5]  D. Barraclough,et al.  Prefrontal cortex and decision making in a mixed-strategy game , 2004, Nature Neuroscience.

[6]  Peter Ford Dominey,et al.  A Model of Corticostriatal Plasticity for Learning Oculomotor Associations and Sequences , 1995, Journal of Cognitive Neuroscience.

[7]  Keiji Tanaka,et al.  Medial prefrontal cell activity signaling prediction errors of action values , 2007, Nature Neuroscience.

[8]  E. Procyk,et al.  Behavioral Shifts and Action Valuation in the Anterior Cingulate Cortex , 2008, Neuron.

[9]  Kenji Doya,et al.  Metalearning and neuromodulation , 2002, Neural Networks.

[10]  Jonathan D. Cohen,et al.  Adaptive gain and the role of the locus coeruleus–norepinephrine system in optimal performance , 2005, The Journal of comparative neurology.

[11]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[12]  Mehdi Khamassi,et al.  Combining Self-organizing Maps with Mixtures of Experts: Application to an Actor-Critic Model of Reinforcement Learning in the Basal Ganglia , 2006, SAB.

[13]  S Dehaene,et al.  A neuronal model of a global workspace in effortful cognitive tasks. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[14]  A. Guillot,et al.  A basal ganglia inspired model of action selection evaluated in a robotic survival task. , 2003, Journal of integrative neuroscience.

[15]  Kristina M. Visscher,et al.  A Core System for the Implementation of Task Sets , 2006, Neuron.

[16]  M. Posner The Cognitive Neuroscience of Attention , 2020 .

[17]  P. Goldman-Rakic,et al.  Modulation of Dorsolateral Prefrontal Delay Activity during Self-Organized Behavior , 2006, The Journal of Neuroscience.

[18]  Joshua W. Brown,et al.  Learned Predictions of Error Likelihood in the Anterior Cingulate Cortex , 2005, Science.

[19]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.