Quantized Attention-Gated Kernel Reinforcement Learning for Brain–Machine Interface Decoding

Reinforcement learning (RL)-based decoders in brain–machine interfaces (BMIs) interpret dynamic neural activity without patients’ real limb movements. In conventional RL, the goal state is selected by the user or defined by the physics of the problem, and the decoder finds an optimal policy essentially by assigning credit over time, which is normally very time-consuming. However, BMI tasks require finding a good policy in very few trials, which impose a limit on the complexity of the tasks that can be learned before the animal quits. Therefore, this paper explores the possibility of letting the agent infer potential goals through actions over space with multiple objects, using the instantaneous reward to assign credit spatially. A previous method, attention-gated RL employs a multilayer perceptron trained with backpropagation, but it is prone to local minima entrapment. We propose a quantized attention-gated kernel RL (QAGKRL) to avoid the local minima adaptation in spatial credit assignment and sparsify the network topology. The experimental results show that the QAGKRL achieves higher successful rates and more stable performance, indicating its powerful decoding ability for more sophisticated BMI tasks as required in clinical applications.

[1]  José Carlos Príncipe,et al.  Coadaptive Brain–Machine Interface via Reinforcement Learning , 2009, IEEE Transactions on Biomedical Engineering.

[2]  Joachim M. Buhmann,et al.  Kickback Cuts Backprop's Red-Tape: Biologically Plausible Credit Assignment in Neural Networks , 2014, AAAI.

[3]  Valeri A. Makarov,et al.  Neural Network Architecture for Cognitive Navigation in Dynamic Environments , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Pieter R. Roelfsema,et al.  Attention-Gated Reinforcement Learning of Internal Representations for Classification , 2005, Neural Computation.

[5]  Badong Chen,et al.  Quantized Kernel Least Mean Square Algorithm , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[6]  Shie Mannor,et al.  The kernel recursive least-squares algorithm , 2004, IEEE Transactions on Signal Processing.

[7]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[8]  Andrew S. Whitford,et al.  Cortical control of a prosthetic arm for self-feeding , 2008, Nature.

[9]  Miguel A. L. Nicolelis,et al.  Real-time control of a robot arm using simultaneously recorded neurons in the motor cortex , 1999, Nature Neuroscience.

[10]  Dawn M. Taylor,et al.  Direct Cortical Control of 3D Neuroprosthetic Devices , 2002, Science.

[11]  Qiaosheng Zhang,et al.  Neural Decoding Using a Parallel Sequential Monte Carlo Method on Point Processes with Ensemble Effect , 2014, BioMed research international.

[12]  Yiwen Wang,et al.  Instantaneous estimation of motor cortical neural encoding for online brain–machine interfaces , 2010, Journal of neural engineering.

[13]  Dragan F. Dimitrov,et al.  Reversible large-scale modification of cortical networks during neuroprosthetic control , 2011, Nature Neuroscience.

[14]  Jon A. Mukand,et al.  Neuronal ensemble control of prosthetic devices by a human with tetraplegia , 2006, Nature.

[15]  José Carlos Príncipe,et al.  Sequential Monte Carlo Point-Process Estimation of Kinematics from Neural Spiking Activity for Brain-Machine Interfaces , 2009, Neural Computation.

[16]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[17]  Branislav Kveton,et al.  Kernel-Based Reinforcement Learning on Representative States , 2012, AAAI.

[18]  Carl E. Rasmussen,et al.  Gaussian Processes in Reinforcement Learning , 2003, NIPS.

[19]  Justin C. Sanchez,et al.  A Symbiotic Brain-Machine Interface through Value-Based Decision Making , 2011, PloS one.

[20]  Yang Gao,et al.  Online Selective Kernel-Based Temporal Difference Learning , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[21]  Sung June Kim,et al.  Superiority of nonlinear mapping in decoding multiple single-unit neuronal spike trains: A simulation study , 2006, Journal of Neuroscience Methods.

[22]  Fang Wang,et al.  Neural Control of a Tracking Task via Attention-Gated Reinforcement Learning for Brain-Machine Interfaces , 2015, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[23]  José Carlos Príncipe,et al.  2011 Ieee International Workshop on Machine Learning for Signal Processing Stochastic Kernel Temporal Difference for Reinforcement Learning , 2022 .

[24]  Jose M. Carmena,et al.  Continuous Closed-Loop Decoder Adaptation with a Recursive Maximum Likelihood Algorithm Allows for Rapid Performance Acquisition in Brain-Machine Interfaces , 2014, Neural Computation.

[25]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[26]  Jose C. Principe,et al.  2009 Special Issue: Ascertaining neuron importance by information theoretical analysis in motor Brain-Machine Interfaces , 2009 .

[27]  John P. Cunningham,et al.  A High-Performance Neural Prosthesis Enabled by Control Algorithm Design , 2012, Nature Neuroscience.

[28]  Byron M. Yu,et al.  A high-performance brain–computer interface , 2006, Nature.

[29]  Miriam Zacksenhouse,et al.  Cortical Ensemble Adaptation to Represent Velocity of an Artificial Actuator Controlled by a Brain-Machine Interface , 2005, The Journal of Neuroscience.

[30]  Kip A Ludwig,et al.  Naïve coadaptive cortical control , 2005, Journal of neural engineering.

[31]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[32]  Weifeng Liu,et al.  Kernel Adaptive Filtering: A Comprehensive Introduction , 2010 .

[33]  Justin C. Sanchez,et al.  Towards autonomous neuroprosthetic control using Hebbian reinforcement learning , 2013, Journal of neural engineering.

[34]  Saori C. Tanaka,et al.  Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops , 2004, Nature Neuroscience.

[35]  José Carlos Príncipe,et al.  Reinforcement learning via kernel temporal difference , 2011, 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[36]  J. C. Sanchez,et al.  Control of a center-out reaching task using a reinforcement learning Brain-Machine Interface , 2011, 2011 5th International IEEE/EMBS Conference on Neural Engineering.

[37]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[38]  J. M. Carmena,et al.  Closed-Loop Decoder Adaptation on Intermediate Time-Scales Facilitates Rapid BMI Performance Improvements Independent of Decoder Initialization Conditions , 2012, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[39]  K. Doya Modulators of decision making , 2008, Nature Neuroscience.

[40]  Miguel A. L. Nicolelis,et al.  Brain–machine interfaces: past, present and future , 2006, Trends in Neurosciences.

[41]  André da Motta Salles Barreto,et al.  On-line Reinforcement Learning Using Incremental Kernel-Based Stochastic Factorization , 2012, NIPS.

[42]  Michael J. Black,et al.  Modeling and decoding motor cortical activity using a switching Kalman filter , 2004, IEEE Transactions on Biomedical Engineering.

[43]  Weifeng Liu,et al.  An Information Theoretic Approach of Designing Sparse Kernel Adaptive Filters , 2009, IEEE Transactions on Neural Networks.

[44]  S. Haykin,et al.  Kernel Least‐Mean‐Square Algorithm , 2010 .

[45]  John N. Tsitsiklis,et al.  Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[46]  André da Motta Salles Barreto,et al.  Reinforcement Learning using Kernel-Based Stochastic Factorization , 2011, NIPS.

[47]  R. Andersen,et al.  Cognitive Control Signals for Neural Prosthetics , 2004, Science.

[48]  Jose M. Carmena,et al.  Design and Analysis of Closed-Loop Decoder Adaptation Algorithms for Brain-Machine Interfaces , 2013, Neural Computation.

[49]  Branislav Kveton,et al.  Structured Kernel-Based Reinforcement Learning , 2013, AAAI.

[50]  S I Helms Tillery,et al.  Training in Cortical Control of Neuroprosthetic Devices Improves Signal Extraction from Small Neuronal Ensembles , 2003, Reviews in the neurosciences.

[51]  Robert E Kass,et al.  Functional network reorganization during learning in a brain-computer interface paradigm , 2008, Proceedings of the National Academy of Sciences.

[52]  Xin Xu,et al.  Kernel-Based Least Squares Policy Iteration for Reinforcement Learning , 2007, IEEE Transactions on Neural Networks.

[53]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.