Use of Frontal Lobe Hemodynamics as Reinforcement Signals to an Adaptive Controller

Decision-making ability in the frontal lobe (among other brain structures) relies on the assignment of value to states of the animal and its environment. Then higher valued states can be pursued and lower (or negative) valued states avoided. The same principle forms the basis for computational reinforcement learning controllers, which have been fruitfully applied both as models of value estimation in the brain, and as artificial controllers in their own right. This work shows how state desirability signals decoded from frontal lobe hemodynamics, as measured with near-infrared spectroscopy (NIRS), can be applied as reinforcers to an adaptable artificial learning agent in order to guide its acquisition of skills. A set of experiments carried out on an alert macaque demonstrate that both oxy- and deoxyhemoglobin concentrations in the frontal lobe show differences in response to both primarily and secondarily desirable (versus undesirable) stimuli. This difference allows a NIRS signal classifier to serve successfully as a reinforcer for an adaptive controller performing a virtual tool-retrieval task. The agent's adaptability allows its performance to exceed the limits of the NIRS classifier decoding accuracy. We also show that decoding state desirabilities is more accurate when using relative concentrations of both oxyhemoglobin and deoxyhemoglobin, rather than either species alone.

[1]  Thomas G. Dietterich,et al.  High-Performance Job-Shop Scheduling With A Time-Delay TD-lambda Network , 1995, NIPS.

[2]  A. Hielscher,et al.  Three-dimensional optical tomography of hemodynamics in the human head. , 2001, Optics express.

[3]  B. Berger,et al.  Dopaminergic innervation of the cerebral cortex: unexpected differences between rodents and primates , 1991, Trends in Neurosciences.

[4]  Alexis Amadon,et al.  B0 homogeneity throughout the monkey brain is strongly improved in the sphinx position as compared to the supine position , 2006, Journal of magnetic resonance imaging : JMRI.

[5]  Xenophon Papademetris,et al.  BioImage Suite: An integrated medical image analysis suite: An update. , 2006, The insight journal.

[6]  M. Shadlen,et al.  Effect of Expected Reward Magnitude on the Response of Neurons in the Dorsolateral Prefrontal Cortex of the Macaque , 1999, Neuron.

[7]  P. Goldman-Rakic,et al.  Characterization of the dopaminergic innervation of the primate frontal cortex using a dopamine-specific antibody. , 1993, Cerebral cortex.

[8]  Britton Chance,et al.  Experimental study of migration depth for the photons measured at sample surface , 1991, Photonics West - Lasers and Applications in Science and Engineering.

[9]  Joseph T. Francis,et al.  A bio-friendly and economical technique for chronic implantation of multiple microelectrode arrays , 2010, Journal of Neuroscience Methods.

[10]  Leslie G. Ungerleider,et al.  The role of prefrontal cortex in working memory: examining the contents of consciousness. , 1998, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[11]  E. Miller,et al.  Learning Substrates in the Primate Prefrontal Cortex and Striatum: Sustained Activity Related to Successful Actions , 2009, Neuron.

[12]  E. Koechlin,et al.  Anterior Prefrontal Function and the Limits of Human Decision-Making , 2007, Science.

[13]  Christopher A. Buneo,et al.  Direct visuomotor transformations for reaching , 2002, Nature.

[14]  O. Hikosaka,et al.  Influence of reward expectation on visuospatial processing in macaque lateral prefrontal cortex. , 2002, Journal of neurophysiology.

[15]  R. Desimone,et al.  Neural Mechanisms of Visual Working Memory in Prefrontal Cortex of the Macaque , 1996, The Journal of Neuroscience.

[16]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[17]  E. Miller,et al.  Neuronal activity in primate dorsolateral and orbital prefrontal cortex during performance of a reward preference task , 2003, The European journal of neuroscience.

[18]  A. Kleinschmidt,et al.  Simultaneous Recording of Cerebral Blood Oxygenation Changes during Human Brain Activation by Magnetic Resonance Imaging and Near-Infrared Spectroscopy , 1996, Journal of cerebral blood flow and metabolism : official journal of the International Society of Cerebral Blood Flow and Metabolism.

[19]  C. D. Coryell,et al.  The Magnetic Properties and Structure of Hemoglobin, Oxyhemoglobin and Carbonmonoxyhemoglobin , 1936, Proceedings of the National Academy of Sciences.

[20]  Peter Stone,et al.  Reinforcement Learning for RoboCup Soccer Keepaway , 2005, Adapt. Behav..

[21]  J. Mcculloch,et al.  Vasomotor responses of cerebral arterioles in situ to putative dopamine receptor agonists , 1985, British journal of pharmacology.

[22]  P. Goldman-Rakic,et al.  Posterior parietal cortex in rhesus monkey: II. Evidence for segregated corticocortical networks linking sensory and limbic areas with the frontal lobe , 1989, The Journal of comparative neurology.

[23]  J. Gore,et al.  Origins of Spatial Working Memory Deficits in Schizophrenia: An Event-Related fMRI and Near-Infrared Spectroscopy Study , 2008, PloS one.

[24]  Keiji Tanaka,et al.  Neuronal Correlates of Goal-Based Motor Selection in the Prefrontal Cortex , 2003, Science.

[25]  K. Kawamura,et al.  Corticocortical projections to the prefrontal cortex in the rhesus monkey investigated with horseradish peroxidase techniques , 1984, Neuroscience Research.

[26]  Tom Chau,et al.  Decoding subjective preference from single-trial near-infrared spectroscopy signals , 2009, Journal of neural engineering.

[27]  M. Shadlen,et al.  Neural correlates of a decision in the dorsolateral prefrontal cortex of the macaque , 1999, Nature Neuroscience.

[28]  P. Goldman-Rakic,et al.  Dopaminergic regulation of cerebral cortical microcirculation , 1998, Nature Neuroscience.

[29]  Martin Wolf,et al.  Single-trial classification of motor imagery differing in task complexity: a functional near-infrared spectroscopy study , 2011, Journal of NeuroEngineering and Rehabilitation.

[30]  P. Goldman-Rakic,et al.  Organization of the nigrothalamocortical system in the rhesus monkey , 1985, The Journal of comparative neurology.

[31]  W. Schultz,et al.  Preferential activation of midbrain dopamine neurons by appetitive rather than aversive stimuli , 1996, Nature.

[32]  J.C. Sanchez,et al.  Brain-Machine Interface Control via Reinforcement Learning , 2007, 2007 3rd International IEEE/EMBS Conference on Neural Engineering.

[33]  Bruno B. Averbeck,et al.  The Statistical Neuroanatomy of Frontal Networks in the Macaque , 2008, PLoS Comput. Biol..

[34]  Ryuta Kawashima,et al.  A NIRS-based brain-computer interface system during motor imagery: System development and online feedback training , 2009, 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[35]  N. Logothetis,et al.  Neurophysiological investigation of the basis of the fMRI signal , 2001, Nature.

[36]  Shimon Whiteson,et al.  Temporal Difference and Policy Search Methods for Reinforcement Learning: An Empirical Comparison , 2007, AAAI.

[37]  C. Julien The enigma of Mayer waves: Facts and models. , 2006, Cardiovascular research.

[38]  M. Roesch,et al.  Neuronal Activity Related to Reward Value and Motivation in Primate Frontal Cortex , 2004, Science.

[39]  Reza Shadmehr and Steven P. Wise The computational neurobiology of reaching and pointing , 2014 .

[40]  M. Mallar Chakravarty,et al.  An MRI based average macaque monkey stereotaxic atlas and space (MNI monkey space) , 2011, NeuroImage.

[41]  J. C. Sanchez,et al.  Control of a center-out reaching task using a reinforcement learning Brain-Machine Interface , 2011, 2011 5th International IEEE/EMBS Conference on Neural Engineering.

[42]  T. Sawaguchi,et al.  Monkey prefrontal neuronal activity coding the forthcoming saccade in an oculomotor delayed matching-to-sample task. , 1998, Journal of neurophysiology.

[43]  E. Procyk,et al.  Neuroanatomical Basis of Motivational and Cognitive Control: A Focus on the Medial and Lateral Prefrontal Cortex. , 2011 .

[44]  W. Schultz Behavioral theories and the neurophysiology of reward. , 2006, Annual review of psychology.

[45]  Meltem Izzetoglu,et al.  Motion artifact cancellation in NIR spectroscopy using discrete Kalman filtering , 2010, Biomedical engineering online.

[46]  Meltem Izzetoglu,et al.  Motion artifact cancellation in NIR spectroscopy using Wiener filtering , 2005, IEEE Transactions on Biomedical Engineering.

[47]  Shirley Coyle,et al.  On the suitability of near-infrared (NIR) systems for next-generation brain-computer interfaces. , 2004, Physiological measurement.

[48]  George I. Christopoulos,et al.  Neuronal Distortions of Reward Probability without Choice , 2008, The Journal of Neuroscience.

[49]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[50]  Ji-Kyung Choi,et al.  Brain hemodynamic changes mediated by dopamine receptors: Role of the cerebral microvasculature in dopamine-mediated neurovascular coupling , 2006, NeuroImage.

[51]  P. Goldman-Rakic,et al.  Dissociation of object and spatial processing domains in primate prefrontal cortex. , 1993, Science.

[52]  D. Pandya,et al.  Efferent cortico-cortical projections of the prefrontal cortex in the rhesus monkey. , 1971, Brain research.

[53]  P. Goldman-Rakic,et al.  Mnemonic coding of visual space in the monkey's dorsolateral prefrontal cortex. , 1989, Journal of neurophysiology.

[54]  R. Buxton,et al.  Dynamics of blood flow and oxygenation changes during brain activation: The balloon model , 1998, Magnetic resonance in medicine.

[55]  Samuel M. McClure,et al.  Separate Neural Systems Value Immediate and Delayed Monetary Rewards , 2004, Science.

[56]  José Carlos Príncipe,et al.  Reinforcement learning via kernel temporal difference , 2011, 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[57]  A. Robertson Multiple reward systems and the prefrontal cortex , 1989, Neuroscience & Biobehavioral Reviews.

[58]  D. Delpy,et al.  Characterization of the near infrared absorption spectra of cytochrome aa3 and haemoglobin for the non-invasive monitoring of cerebral oxygenation. , 1988, Biochimica et biophysica acta.

[59]  Thomas G. Dietterich,et al.  High-Performance Job-Shop Scheduling With A Time-Delay TD(λ) Network , 1995, NIPS 1995.

[60]  Juha Virtanen,et al.  Accelerometer-based method for correcting signal baseline changes caused by motion artifacts in medical near-infrared spectroscopy. , 2011, Journal of biomedical optics.

[61]  Andrew G. Barto,et al.  Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[62]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[63]  J. O'Doherty,et al.  Reward Value Coding Distinct From Risk Attitude-Related Uncertainty Coding in Human Reward Systems , 2006, Journal of neurophysiology.

[64]  D. Gaffan,et al.  Amygdalar interaction with the mediodorsal nucleus of the thalamus and the ventromedial prefrontal cortex in stimulus-reward associative learning in the monkey , 1990, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[65]  M. Delgado,et al.  How instructed knowledge modulates the neural systems of reward learning , 2010, Proceedings of the National Academy of Sciences.

[66]  Tommi S. Jaakkola,et al.  Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.

[67]  E. Eskandar,et al.  Encoding of Both Positive and Negative Reward Prediction Errors by Neurons of the Primate Lateral Prefrontal Cortex and Caudate Nucleus , 2011, The Journal of Neuroscience.

[68]  R. Poldrack,et al.  Can the cerebral metabolic rate of oxygen be estimated with near-infrared spectroscopy? , 2003, Physics in medicine and biology.

[69]  Joseph T. Francis,et al.  Properties of a temporal difference reinforcement learning brain machine interface driven by a simulated motor cortex , 2012, 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[70]  P. Goldman-Rakic,et al.  Common cortical and subcortical targets of the dorsolateral prefrontal and posterior parietal cortices in the rhesus monkey: evidence for a distributed neural network subserving spatially guided behavior , 1988, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[71]  Masataka Watanabe Reward expectancy in primate prefrental neurons , 1996, Nature.

[72]  A. Villringer,et al.  Illuminating the BOLD signal: combined fMRI-fNIRS studies. , 2006, Magnetic resonance imaging.

[73]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[74]  A. Laferriere,et al.  Disruption of the connections between the mediodorsal and sulcal prefrontal cortices alters the associability of rewarding medial cortical stimulation to place and taste stimuli in rats. , 1989, Behavioral neuroscience.

[75]  V L Villemagne,et al.  Activation of memory circuits during cue-elicited cocaine craving. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[76]  J. Leon-Carrion,et al.  Does dorsolateral prefrontal cortex (DLPFC) activation return to baseline when sexual stimuli cease? The role of DLPFC in visual sexual stimulation , 2007, Neuroscience Letters.

[77]  R. Andersen,et al.  Callosal and prefrontal associational projecting cell populations in area 7A of the macaque monkey: A study using retrogradely transported fluorescent dyes , 1985, The Journal of comparative neurology.

[78]  Edward E. Smith,et al.  Spatial working memory in humans as revealed by PET , 1993, Nature.

[79]  D. Tank,et al.  Brain magnetic resonance imaging with contrast dependent on blood oxygenation. , 1990, Proceedings of the National Academy of Sciences of the United States of America.