Using Reinforcement Learning to Provide Stable Brain-Machine Interface Control Despite Neural Input Reorganization

Brain-machine interface (BMI) systems give users direct neural control of robotic, communication, or functional electrical stimulation systems. As BMI systems begin transitioning from laboratory settings into activities of daily living, an important goal is to develop neural decoding algorithms that can be calibrated with a minimal burden on the user, provide stable control for long periods of time, and can be responsive to fluctuations in the decoder’s neural input space (e.g. neurons appearing or being lost amongst electrode recordings). These are significant challenges for static neural decoding algorithms that assume stationary input/output relationships. Here we use an actor-critic reinforcement learning architecture to provide an adaptive BMI controller that can successfully adapt to dramatic neural reorganizations, can maintain its performance over long time periods, and which does not require the user to produce specific kinetic or kinematic activities to calibrate the BMI. Two marmoset monkeys used the Reinforcement Learning BMI (RLBMI) to successfully control a robotic arm during a two-target reaching task. The RLBMI was initialized using random initial conditions, and it quickly learned to control the robot from brain states using only a binary evaluative feedback regarding whether previously chosen robot actions were good or bad. The RLBMI was able to maintain control over the system throughout sessions spanning multiple weeks. Furthermore, the RLBMI was able to quickly adapt and maintain control of the robot despite dramatic perturbations to the neural inputs, including a series of tests in which the neuron input space was deliberately halved or doubled.

[1]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[2]  D. Szarowski,et al.  Cerebral Astrocyte Response to Micromachined Silicon Implants , 1999, Experimental Neurology.

[3]  Jerald D. Kralik,et al.  Real-time prediction of hand trajectory by ensembles of cortical neurons in primates , 2000, Nature.

[4]  W. Schultz Multiple reward signals in the brain , 2000, Nature Reviews Neuroscience.

[5]  Dawn M. Taylor,et al.  Direct Cortical Control of 3D Neuroprosthetic Devices , 2002, Science.

[6]  S. Meagher Instant neural control of a movement signal , 2002 .

[7]  David M. Santucci,et al.  Learning to Control a Brain–Machine Interface for Reaching and Grasping by Primates , 2003, PLoS biology.

[8]  C. Mehring,et al.  Inference of hand movements from local field potentials in monkey motor cortex , 2003, Nature Neuroscience.

[9]  C. Mehring,et al.  Comparing information about arm movement direction in single channels of local and epicortical field potentials from monkey and human motor cortex , 2004, Journal of Physiology-Paris.

[10]  José Carlos Príncipe,et al.  Ascertaining the importance of neurons to develop better brain-machine interfaces , 2004, IEEE Transactions on Biomedical Engineering.

[11]  Jonathan R Wolpaw,et al.  Control of a two-dimensional movement signal by a noninvasive brain-computer interface in humans. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[12]  L. Paninski,et al.  Spatiotemporal tuning of motor cortical neurons for hand position and velocity. , 2004, Journal of neurophysiology.

[13]  Kip A Ludwig,et al.  Naïve coadaptive cortical control , 2005, Journal of neural engineering.

[14]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[15]  Mikhail A Lebedev,et al.  Stable Ensemble Performance with Single-neuron Variability during Reaching Movements in Primates , 2022 .

[16]  Miriam Zacksenhouse,et al.  Cortical Ensemble Adaptation to Represent Velocity of an Artificial Actuator Controlled by a Brain-Machine Interface , 2005, The Journal of Neuroscience.

[17]  Jon A. Mukand,et al.  Neuronal ensemble control of prosthetic devices by a human with tetraplegia , 2006, Nature.

[18]  Byron M. Yu,et al.  A high-performance brain–computer interface , 2006, Nature.

[19]  Teresa H. Y. Meng,et al.  HermesB: A Continuous Neural Recording System for Freely Behaving Primates , 2007, IEEE Transactions on Biomedical Engineering.

[20]  J. Wolpaw,et al.  Decoding two-dimensional movement trajectories using electrocorticographic signals in humans , 2007, Journal of neural engineering.

[21]  Lee E Miller,et al.  Prediction of upper limb muscle activity from motor cortical discharge during reaching , 2007, Journal of neural engineering.

[22]  Wei Sun,et al.  A Semisupervised Support Vector Machines Algorithm for BCI Systems , 2007, Comput. Intell. Neurosci..

[23]  Matthew W Spitzer,et al.  Anatomical and physiological definition of the motor cortex of the marmoset monkey , 2008, The Journal of comparative neurology.

[24]  Wei Wu,et al.  Real-Time Decoding of Nonstationary Neural Activity in Motor Cortex , 2008, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[25]  Robert E Kass,et al.  Functional network reorganization during learning in a brain-computer interface paradigm , 2008, Proceedings of the National Academy of Sciences.

[26]  Andrew S. Whitford,et al.  Cortical control of a prosthetic arm for self-feeding , 2008, Nature.

[27]  J. Kaas,et al.  Microstimulation and architectonics of frontoparietal cortex in common marmosets (Callithrix jacchus) , 2008, The Journal of comparative neurology.

[28]  Yali Amit,et al.  Single-unit stability using chronically implanted multielectrode arrays. , 2009, Journal of neurophysiology.

[29]  John D. Newman,et al.  A combined histological and MRI brain atlas of the common marmoset monkey, Callithrix jacchus , 2009, Brain Research Reviews.

[30]  Steven M Chase,et al.  Control of a brain–computer interface without spike sorting , 2009, Journal of neural engineering.

[31]  S. Solla,et al.  Toward the Restoration of Hand Use to a Paralyzed Monkey: Brain-Controlled Functional Electrical Stimulation of Forearm Muscles , 2009, PloS one.

[32]  Kip A Ludwig,et al.  Using a common average reference to improve cortical neuron recordings from microelectrode arrays. , 2009, Journal of neurophysiology.

[33]  Joseph E. O’Doherty,et al.  Unscented Kalman Filter for Brain-Machine Interfaces , 2009, PloS one.

[34]  Cuntai Guan,et al.  Unsupervised Brain Computer Interface Based on Intersubject Information and Online Adaptation , 2009, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[35]  José Carlos Príncipe,et al.  Coadaptive Brain–Machine Interface via Reinforcement Learning , 2009, IEEE Transactions on Biomedical Engineering.

[36]  J. Carmena,et al.  Emergence of a Stable Cortical Map for Neuroprosthetic Control , 2009, PLoS biology.

[37]  Ikuko Tanaka,et al.  Stereo Navi 2.0: Software for stereotaxic surgery of the common marmoset (Callithrix jacchus) , 2009, Neuroscience Research.

[38]  W. A. Sarnacki,et al.  Electroencephalographic (EEG) control of three-dimensional movement , 2010, Journal of neural engineering.

[39]  Iñaki Iturrate,et al.  Robot reinforcement learning using EEG-based reward signals , 2010, 2010 IEEE International Conference on Robotics and Automation.

[40]  Yasuhiro Wada,et al.  Adaptive Classification for Brain-Machine Interface with Reinforcement Learning , 2011, ICONIP.

[41]  Christa Neuper,et al.  Error potential detection during continuous movement of an artificial arm controlled by brain–computer interface , 2012, Medical & Biological Engineering & Computing.

[42]  Vicenç Gómez,et al.  On the use of interaction error potentials for adaptive brain computer interfaces , 2011, Neural Networks.

[43]  Farbod Fahimi,et al.  Online human training of a myoelectric prosthesis controller via actor-critic reinforcement learning , 2011, 2011 IEEE International Conference on Rehabilitation Robotics.

[44]  Miguel A. L. Nicolelis,et al.  Adaptive Decoding for Brain-Machine Interfaces Through Bayesian Parameter Updates , 2011, Neural Computation.

[45]  Motoaki Kawanabe,et al.  Toward Unsupervised Adaptation of LDA for Brain–Computer Interfaces , 2011, IEEE Transactions on Biomedical Engineering.

[46]  Feng Li,et al.  Semi-supervised joint spatio-temporal feature selection for P300-based BCI speller , 2011, Cognitive Neurodynamics.

[47]  Klaus-Robert Müller,et al.  Co-adaptive calibration to improve BCI efficiency , 2011, Journal of neural engineering.

[48]  Michael J. Black,et al.  Neural control of cursor trajectory and click by a human with tetraplegia 1000 days after implant of an intracortical microelectrode array , 2011 .

[49]  Dragan F. Dimitrov,et al.  Reversible large-scale modification of cortical networks during neuroprosthetic control , 2011, Nature Neuroscience.

[50]  Klaus-Robert Müller,et al.  Machine-Learning-Based Coadaptive Calibration for Brain-Computer Interfaces , 2011, Neural Computation.

[51]  Justin C. Sanchez,et al.  A Symbiotic Brain-Machine Interface through Value-Based Decision Making , 2011, PloS one.

[52]  Vikash Gilja,et al.  Long-term Stability of Neural Prosthetic Control Signals from Silicon Cortical Arrays in Rhesus Macaque Motor Cortex , 2010 .

[53]  John P. Cunningham,et al.  A High-Performance Neural Prosthesis Enabled by Control Algorithm Design , 2012, Nature Neuroscience.

[54]  Nicolas Y. Masse,et al.  Reach and grasp by people with tetraplegia using a neurally controlled robotic arm , 2012, Nature.

[55]  Justin C. Sanchez,et al.  Comprehensive characterization and failure modes of tungsten microwire arrays in chronic neural implants , 2012, Journal of neural engineering.

[56]  Andreas Schulze-Bonhage,et al.  Error-related electrocorticographic activity in humans during continuous movements , 2012, Journal of neural engineering.

[57]  A. Schwartz,et al.  Behavioral and neural correlates of visuomotor adaptation observed through a brain-computer interface in primary motor cortex. , 2012, Journal of neurophysiology.

[58]  J. M. Carmena,et al.  Closed-Loop Decoder Adaptation on Intermediate Time-Scales Facilitates Rapid BMI Performance Improvements Independent of Decoder Initialization Conditions , 2012, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[59]  Justin C. Sanchez,et al.  Brain-machine interface control of a robot arm using actor-critic rainforcement learning , 2012, 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[60]  Michael Petrides,et al.  The marmoset brain in stereotaxic coordinates , 2012 .

[61]  David Sussillo,et al.  A recurrent neural network for closed-loop intracortical brain–machine interface decoders , 2012, Journal of neural engineering.

[62]  Wolfgang Rosenstiel,et al.  Online Adaptation of a c-VEP Brain-Computer Interface(BCI) Based on Error-Related Potentials and Unsupervised Learning , 2012, PloS one.

[63]  Sven Hoffmann,et al.  Predictive information processing in the brain: errors and response monitoring. , 2012, International journal of psychophysiology : official journal of the International Organization of Psychophysiology.

[64]  A. Schwartz,et al.  Recording from the same neurons chronically in motor cortex. , 2012, Journal of neurophysiology.

[65]  Justin C. Sanchez,et al.  Feature extraction and unsupervised classification of neural population reward signals for reinforcement based BMI , 2013, 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[66]  Justin C. Sanchez,et al.  Extraction of error related local field potentials from the striatum during environmental perturbations of a robotic arm , 2013, 2013 6th International IEEE/EMBS Conference on Neural Engineering (NER).

[67]  A. Schwartz,et al.  High-performance neuroprosthetic control by an individual with tetraplegia , 2013, The Lancet.

[68]  Justin C. Sanchez,et al.  Towards autonomous neuroprosthetic control using Hebbian reinforcement learning , 2013, Journal of neural engineering.