Robot Cognitive Control with a Neurophysiologically Inspired Reinforcement Learning Model

A major challenge in modern robotics is to liberate robots from controlled industrial settings, and allow them to interact with humans and changing environments in the real-world. The current research attempts to determine if a neurophysiologically motivated model of cortical function in the primate can help to address this challenge. Primates are endowed with cognitive systems that allow them to maximize the feedback from their environment by learning the values of actions in diverse situations and by adjusting their behavioral parameters (i.e., cognitive control) to accommodate unexpected events. In such contexts uncertainty can arise from at least two distinct sources – expected uncertainty resulting from noise during sensory-motor interaction in a known context, and unexpected uncertainty resulting from the changing probabilistic structure of the environment. However, it is not clear how neurophysiological mechanisms of reinforcement learning and cognitive control integrate in the brain to produce efficient behavior. Based on primate neuroanatomy and neurophysiology, we propose a novel computational model for the interaction between lateral prefrontal and anterior cingulate cortex reconciling previous models dedicated to these two functions. We deployed the model in two robots and demonstrate that, based on adaptive regulation of a meta-parameter β that controls the exploration rate, the model can robustly deal with the two kinds of uncertainties in the real-world. In addition the model could reproduce monkey behavioral performance and neurophysiological data in two problem-solving tasks. A last experiment extends this to human–robot interaction with the iCub humanoid, and novel sources of uncertainty corresponding to “cheating” by the human. The combined results provide concrete evidence for the ability of neurophysiologically inspired cognitive systems to control advanced robots in the real-world.

[1]  Angela J. Yu,et al.  Uncertainty, Neuromodulation, and Attention , 2005, Neuron.

[2]  E. Procyk,et al.  Behavioral Shifts and Action Valuation in the Anterior Cingulate Cortex , 2008, Neuron.

[3]  Jonathan D. Cohen,et al.  An exploration-exploitation model based on norepinepherine and dopamine activity , 2005, NIPS.

[4]  C. Berridge,et al.  The locus coeruleus–noradrenergic system: modulation of behavioral state and state-dependent cognitive processes , 2003, Brain Research Reviews.

[5]  Timothy Edward John Behrens,et al.  Contrasting roles for cingulate and orbitofrontal cortex in decisions and social behaviour , 2007, Trends in Cognitive Sciences.

[6]  P. Goldman-Rakic,et al.  Modulation of Dorsolateral Prefrontal Delay Activity during Self-Organized Behavior , 2006, The Journal of Neuroscience.

[7]  Joshua W. Brown,et al.  Learned Predictions of Error Likelihood in the Anterior Cingulate Cortex , 2005, Science.

[8]  Kenji Doya,et al.  Metalearning and neuromodulation , 2002, Neural Networks.

[9]  Jonathan D. Cohen,et al.  Adaptive gain and the role of the locus coeruleus–norepinephrine system in optimal performance , 2005, The Journal of comparative neurology.

[10]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[11]  Keiji Tanaka,et al.  Medial prefrontal cell activity signaling prediction errors of action values , 2007, Nature Neuroscience.

[12]  Daeyeol Lee,et al.  Functional Specialization of the Primate Frontal Cortex during Decision Making , 2007, The Journal of Neuroscience.

[13]  Michael A. Arbib,et al.  The Neural Simulation Language: A System for Brain Modeling , 2002 .

[14]  K. Doya Modulators of decision making , 2008, Nature Neuroscience.

[15]  Timothy E. J. Behrens,et al.  Optimal decision making and the anterior cingulate cortex , 2006, Nature Neuroscience.

[16]  Mark S. Gilzenrat,et al.  A Systems-Level Perspective on Attention and Cognitive Control: Guided Activation, Adaptive Gating, Conflict Monitoring, and Exploitation versus Exploration. , 2004 .

[17]  Angela J. Yu,et al.  Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration , 2007, Philosophical Transactions of the Royal Society B: Biological Sciences.

[18]  M. Frank,et al.  Prefrontal and striatal dopaminergic genes predict individual differences in exploration and exploitation. , 2009, Nature neuroscience.

[19]  Peter Ford Dominey,et al.  A Model of Corticostriatal Plasticity for Learning Oculomotor Associations and Sequences , 1995, Journal of Cognitive Neuroscience.

[20]  Eiichi Yoshida,et al.  Real-Time Spoken-Language Programming for Cooperative Interaction with a Humanoid Apprentice , 2009, Int. J. Humanoid Robotics.

[21]  E. Procyk,et al.  Expectations, gains, and losses in the anterior cingulate cortex , 2007, Cognitive, affective & behavioral neuroscience.

[22]  P. Dayan,et al.  Cortical substrates for exploratory decisions in humans , 2006, Nature.

[23]  H. Seo,et al.  Temporal Filtering of Reward Signals in the Dorsal Anterior Cingulate Cortex during a Mixed-Strategy Game , 2007, The Journal of Neuroscience.

[24]  Clay B. Holroyd,et al.  The neural basis of human error processing: reinforcement learning, dopamine, and the error-related negativity. , 2002, Psychological review.

[25]  E. Procyk,et al.  Anterior cingulate activity during routine and non-routine sequential behaviors in macaques , 2000, Nature Neuroscience.

[26]  E. Koechlin,et al.  Motivation and cognitive control in the human prefrontal cortex , 2009, Nature Neuroscience.

[27]  Timothy E. J. Behrens,et al.  Learning the value of information in an uncertain world , 2007, Nature Neuroscience.

[28]  Jean-Arcady Meyer,et al.  Biologically Inspired Robots , 2008, Springer Handbook of Robotics.

[29]  E. Miller,et al.  An integrative theory of prefrontal cortex function. , 2001, Annual review of neuroscience.

[30]  Nikolaos G. Tsagarakis,et al.  iCub: the design and realization of an open humanoid platform for cognitive and neuroscience research , 2007, Adv. Robotics.

[31]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[32]  Peter Redgrave,et al.  A computational model of action selection in the basal ganglia. I. A new functional anatomy , 2001, Biological Cybernetics.

[33]  Jeffrey L. Krichmar,et al.  The Neuromodulatory System: A Framework for Survival and Adaptive Behavior in a Challenging World , 2008, Adapt. Behav..

[34]  Peter Ford Dominey,et al.  Linking Language with Embodied and Teleological Representations of Action for Humanoid Cognition , 2010, Front. Neurorobot..

[35]  S Dehaene,et al.  A neuronal model of a global workspace in effortful cognitive tasks. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Jacques Gautrais,et al.  SpikeNET: A simulator for modeling large networks of integrate and fire neurons , 1999, Neurocomputing.

[37]  Mehdi Khamassi,et al.  Combining Self-organizing Maps with Mixtures of Experts: Application to an Actor-Critic Model of Reinforcement Learning in the Basal Ganglia , 2006, SAB.

[38]  Timothy E. J. Behrens,et al.  Choice, uncertainty and value in prefrontal and cingulate cortex , 2008, Nature Neuroscience.

[39]  E. Procyk,et al.  Anterior cingulate error‐related activity is modulated by predicted reward , 2005, The European journal of neuroscience.

[40]  Jean-Arcady Meyer,et al.  Adaptive Behavior , 2005 .

[41]  Nicolas Tabareau,et al.  Where neuroscience and dynamic system theory meet autonomous robotics: A contracting basal ganglia model for action selection , 2008, Neural Networks.

[42]  J. Horvitz Mesolimbocortical and nigrostriatal dopamine responses to salient non-reward events , 2000, Neuroscience.

[43]  C. Summerfield,et al.  An information theoretical approach to prefrontal executive function , 2007, Trends in Cognitive Sciences.

[44]  Giorgio Metta,et al.  YARP: Yet Another Robot Platform , 2006 .

[45]  T. Hökfelt,et al.  The origin of the dopamine nerve terminals in limbic and frontal cortex. Evidence for meso-cortico dopamine neurons. , 1974, Brain research.

[46]  Charles R. E. Wilson,et al.  Meta-Learning, Cognitive Control, and Physiological Interactions between Medial and Lateral Prefrontal Cortex , 2011 .

[47]  Kenji Doya,et al.  Meta-learning in Reinforcement Learning , 2003, Neural Networks.

[48]  K. Gurney,et al.  A Physiologically Plausible Model of Action Selection and Oscillatory Activity in the Basal Ganglia , 2006, The Journal of Neuroscience.

[49]  E. Procyk,et al.  Reward encoding in the monkey anterior cingulate cortex. , 2006, Cerebral cortex.

[50]  G. E. Alexander,et al.  Basal ganglia-thalamocortical circuits: parallel substrates for motor, oculomotor, "prefrontal" and "limbic" functions. , 1990, Progress in brain research.

[51]  R. Pfeifer,et al.  Self-Organization, Embodiment, and Biologically Inspired Robotics , 2007, Science.

[52]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[53]  M. Botvinick,et al.  Conflict monitoring and cognitive control. , 2001, Psychological review.

[54]  Peter Ford Dominey,et al.  A Computational Model of Integration between Reinforcement Learning and Task Monitoring in the Prefrontal Cortex , 2010, SAB.