A Symbiotic Brain-Machine Interface through Value-Based Decision Making

Background In the development of Brain Machine Interfaces (BMIs), there is a great need to enable users to interact with changing environments during the activities of daily life. It is expected that the number and scope of the learning tasks encountered during interaction with the environment as well as the pattern of brain activity will vary over time. These conditions, in addition to neural reorganization, pose a challenge to decoding neural commands for BMIs. We have developed a new BMI framework in which a computational agent symbiotically decoded users' intended actions by utilizing both motor commands and goal information directly from the brain through a continuous Perception-Action-Reward Cycle (PARC). Methodology The control architecture designed was based on Actor-Critic learning, which is a PARC-based reinforcement learning method. Our neurophysiology studies in rat models suggested that Nucleus Accumbens (NAcc) contained a rich representation of goal information in terms of predicting the probability of earning reward and it could be translated into an evaluative feedback for adaptation of the decoder with high precision. Simulated neural control experiments showed that the system was able to maintain high performance in decoding neural motor commands during novel tasks or in the presence of reorganization in the neural input. We then implanted a dual micro-wire array in the primary motor cortex (M1) and the NAcc of rat brain and implemented a full closed-loop system in which robot actions were decoded from the single unit activity in M1 based on an evaluative feedback that was estimated from NAcc. Conclusions Our results suggest that adapting the BMI decoder with an evaluative feedback that is directly extracted from the brain is a possible solution to the problem of operating BMIs in changing environments with dynamic neural signals. During closed-loop control, the agent was able to solve a reaching task by capturing the action and reward interdependency in the brain.

[1]  M S Lewicki,et al.  A review of methods for spike sorting: the detection and classification of neural action potentials. , 1998, Network.

[2]  Raymond J. Bandlow Theories of Learning, 4th Edition. By Ernest R. Hilgard and Gordon H. Bower. Englewood Cliffs, N.J.: Prentice-Hall, Inc., 1975 , 1976 .

[3]  Rui M. Costa,et al.  Rapid Alterations in Corticostriatal Ensemble Coordination during Acute Dopamine-Dependent Motor Dysfunction , 2006, Neuron.

[4]  Douglas L. Jones,et al.  From motivation to action: Functional interface between the limbic system and the motor system , 1980, Progress in Neurobiology.

[5]  Miguel A. L. Nicolelis,et al.  Principles of neural ensemble physiology underlying the operation of brain–machine interfaces , 2009, Nature Reviews Neuroscience.

[6]  J. Kleim,et al.  Functional reorganization of the rat motor cortex following motor skill learning. , 1998, Journal of neurophysiology.

[7]  Michael X. Cohen,et al.  Neurocomputational mechanisms of reinforcement-guided learning in humans: A review , 2008, Cognitive, affective & behavioral neuroscience.

[8]  J. Fuster Upper processing stages of the perception–action cycle , 2004, Trends in Cognitive Sciences.

[9]  B. Balleine Reward and decision making in corticobasal ganglia networks , 2007 .

[10]  José Carlos Príncipe,et al.  Exploiting co-adaptation for the design of symbiotic neuroprosthetic assistants , 2009, Neural Networks.

[11]  Metin Akay,et al.  Handbook of neural engineering , 2006 .

[12]  Miguel A. L. Nicolelis,et al.  Brain–machine interfaces: past, present and future , 2006, Trends in Neurosciences.

[13]  Jerald D. Kralik,et al.  Real-time prediction of hand trajectory by ensembles of cortical neurons in primates , 2000, Nature.

[14]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[15]  Mark J. Thomas,et al.  Biological substrates of reward and aversion: A nucleus accumbens activity hypothesis , 2009, Neuropharmacology.

[16]  Sue Taylor Parker,et al.  Object manipulation, tool use and sensorimotor intelligence as feeding adaptations in cebus monkeys and great apes , 1977 .

[17]  Mandayam A. Srinivasan,et al.  Continuous shared control for stabilizing reaching and grasping with brain-machine interfaces , 2006, IEEE Transactions on Biomedical Engineering.

[18]  Francesca Sargolini,et al.  Ventral striatal plasticity and spatial memory , 2010, Proceedings of the National Academy of Sciences.

[19]  Andrew S. Whitford,et al.  Cortical control of a prosthetic arm for self-feeding , 2008, Nature.

[20]  Jose C. Principe,et al.  An actor-critic architecture and simulator for goal-directed brain-machine interfaces , 2009, 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[21]  R. Andersen,et al.  Cognitive neural prosthetics. , 2010, Annual review of psychology.

[22]  José Carlos Príncipe,et al.  Coadaptive Brain–Machine Interface via Reinforcement Learning , 2009, IEEE Transactions on Biomedical Engineering.

[23]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[24]  H. Groenewegen,et al.  The nucleus accumbens: gateway for limbic structures to reach the motor system? , 1996, Progress in brain research.

[25]  P. Carney,et al.  Structural modifications in chronic microwire electrodes for cortical neuroprosthetics: a case study , 2006, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[26]  D. Buonomano,et al.  Cortical plasticity: from synapses to maps. , 1998, Annual review of neuroscience.

[27]  M. Nicolelis,et al.  Differential Corticostriatal Plasticity during Fast and Slow Motor Skill Learning in Mice , 2004, Current Biology.

[28]  R. Passingham,et al.  Cortico‐basal ganglia pathways are essential for the recall of well–established visuomotor associations , 2004, The European journal of neuroscience.

[29]  M. Roitman,et al.  Nucleus Accumbens Neurons Are Innately Tuned for Rewarding and Aversive Taste Stimuli, Encode Their Predictors, and Are Linked to Motor Output , 2005, Neuron.

[30]  William M. Struthers,et al.  Habituation reduces novelty-induced FOS expression in the striatum and cingulate cortex , 2005, Experimental Brain Research.

[31]  Peter Dayan,et al.  Non-commercial Research and Educational Use including without Limitation Use in Instruction at Your Institution, Sending It to Specific Colleagues That You Know, and Providing a Copy to Your Institution's Administrator. All Other Uses, Reproduction and Distribution, including without Limitation Comm , 2022 .

[32]  R. Andersen,et al.  Neural prosthetic control signals from plan activity , 2003, Neuroreport.

[33]  Eugene M. Izhikevich,et al.  Simple model of spiking neurons , 2003, IEEE Trans. Neural Networks.

[34]  W. Schultz Multiple reward signals in the brain , 2000, Nature Reviews Neuroscience.

[35]  A M Graybiel,et al.  The basal ganglia and adaptive motor control. , 1994, Science.

[36]  S. Grossberg Studies of mind and brain : neural principles of learning, perception, development, cognition, and motor control , 1982 .

[37]  Ziv M. Williams,et al.  Selective enhancement of associative learning by microstimulation of the anterior caudate , 2006, Nature Neuroscience.

[38]  K. Doya,et al.  Multiple Representations of Belief States and Action Values in Corticobasal Ganglia Loops , 2007, Annals of the New York Academy of Sciences.

[39]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[40]  José Carlos Príncipe,et al.  Brain-Machine Interface Engineering , 2006, Brain-Machine Interface Engineering.

[41]  J. Hollerman,et al.  Reward prediction in primate basal ganglia and frontal cortex , 1998, Neuropharmacology.

[42]  Gilles Montagne,et al.  The learning of goal-directed locomotion: a perception-action perspective. , 2003, The Quarterly journal of experimental psychology. A, Human experimental psychology.

[43]  Metin Akay,et al.  Optimal Signal Processing for BrainMachine Interfaces , 2007 .

[44]  L. Swanson The Rat Brain in Stereotaxic Coordinates, George Paxinos, Charles Watson (Eds.). Academic Press, San Diego, CA (1982), vii + 153, $35.00, ISBN: 0 125 47620 5 , 1984 .

[45]  S. Wise,et al.  The motor cortex of the rat: Cytoarchitecture and microstimulation mapping , 1982, The Journal of comparative neurology.

[46]  David M. Santucci,et al.  Learning to Control a Brain–Machine Interface for Reaching and Grasping by Primates , 2003, PLoS biology.

[47]  José Carlos Príncipe,et al.  The gamma-filter-a new class of adaptive IIR filters with restricted feedback , 1993, IEEE Trans. Signal Process..

[48]  Colin Camerer,et al.  A framework for studying the neurobiology of value-based decision making , 2008, Nature Reviews Neuroscience.

[49]  Paul M. B. Vitányi,et al.  Theories of learning , 2007 .

[50]  K. Doya Reinforcement learning: Computational theory and biological mechanisms , 2007, HFSP journal.

[51]  Scott H. Johnson-Frey What's So Special about Human Tool Use? , 2003, Neuron.

[52]  Dawn M. Taylor,et al.  Direct Cortical Control of 3D Neuroprosthetic Devices , 2002, Science.

[53]  Jon A. Mukand,et al.  Neuronal ensemble control of prosthetic devices by a human with tetraplegia , 2006, Nature.