Decoding spoken English from intracortical electrode arrays in dorsal precentral gyrus

OBJECTIVE To evaluate the potential of intracortical electrode array signals for brain-computer interfaces (BCIs) to restore lost speech, we measured the performance of decoders trained to discriminate a comprehensive basis set of 39 English phonemes and to synthesize speech sounds via a neural pattern matching method. We decoded neural correlates of spoken-out-loud words in the 'hand knob' area of precentral gyrus, a step toward the eventual goal of decoding attempted speech from ventral speech areas in patients who are unable to speak. APPROACH Neural and audio data were recorded while two BrainGate2 pilot clinical trial participants, each with two chronically-implanted 96-electrode arrays, spoke 420 different words that broadly sampled English phonemes. Phoneme onsets were identified from audio recordings, and their identities were then classified from neural features consisting of each electrode's binned action potential counts or high-frequency local field potential power. Speech synthesis was performed using the 'Brain-to-Speech' pattern matching method. We also examined two potential confounds specific to decoding overt speech: acoustic contamination of neural signals and systematic differences in labeling different phonemes' onset times. MAIN RESULTS A linear decoder achieved up to 29.3% classification accuracy (chance = 6%) across 39 phonemes, while an RNN classifier achieved 33.9% accuracy. Parameter sweeps indicated that performance did not saturate when adding more electrodes or more training data, and that accuracy improved when utilizing time-varying structure in the data. Microphonic contamination and phoneme onset differences modestly increased decoding accuracy, but could be mitigated by acoustic artifact subtraction and using a neural speech onset marker, respectively. Speech synthesis achieved r = 0.523 correlation between true and reconstructed audio. SIGNIFICANCE The ability to decode speech using intracortical electrode array signals from a nontraditional speech area suggests that placing electrode arrays in ventral speech areas is a promising direction for speech BCIs.

[1]  Dean J. Krusienski,et al.  The Potential of Stereotactic-EEG for Brain-Computer Interfaces: Current Progress and Future Directions , 2020, Frontiers in Neuroscience.

[2]  Francis R. Willett,et al.  Signal processing methods for reducing artifacts in microelectrode brain recordings caused by functional electrical stimulation , 2018, Journal of neural engineering.

[3]  Joe Bray The Representation of Speech , 2018 .

[4]  Francis R. Willett,et al.  Neural ensemble dynamics in dorsal motor cortex during speech in people with paralysis , 2018, bioRxiv.

[5]  Cuntai Guan,et al.  Electrocorticographic representations of segmental features in continuous speech , 2015, Front. Hum. Neurosci..

[6]  Patrick Coppens,et al.  Aphasia and Related Neurogenic Communication Disorders , 2011 .

[7]  Naoshige Uchida,et al.  Demixed principal component analysis of neural population data , 2014, eLife.

[8]  Panagiotis Artemiadis,et al.  Inferring imagined speech using EEG signals: a new approach using Riemannian manifold features , 2018, Journal of neural engineering.

[9]  Nicolas Y. Masse,et al.  Reach and grasp by people with tetraplegia using a neurally controlled robotic arm , 2012, Nature.

[10]  Joseph G. Makin,et al.  Real-time decoding of question-and-answer speech dialogue using human cortical activity , 2019, Nature Communications.

[11]  Matthew C Tate,et al.  Speech synthesis from ECoG using densely connected 3D convolutional neural networks. , 2019, Journal of neural engineering.

[12]  M J Vansteensel,et al.  The influence of prior pronunciations on sensorimotor cortex activity patterns during vowel production , 2018, Journal of neural engineering.

[13]  Johanna Palmio,et al.  Speech deterioration in amyotrophic lateral sclerosis (ALS) after manifestation of bulbar symptoms. , 2018, International journal of language & communication disorders.

[14]  Edward F Chang,et al.  Correction: The auditory representation of speech sounds in human motor cortex , 2016, eLife.

[15]  Francis R. Willett,et al.  Hand Knob Area of Premotor Cortex Represents the Whole Body in a Compositional Way , 2020, Cell.

[16]  Peter Dayan,et al.  The Effect of Correlated Variability on the Accuracy of a Population Code , 1999, Neural Computation.

[17]  M L Boninger,et al.  Ten-dimensional anthropomorphic arm control in a human brain−machine interface: difficulties, solutions, and limitations , 2015, Journal of neural engineering.

[18]  Leon Li,et al.  Brain-to-speech decoding will require linguistic and pragmatic data , 2018, Journal of neural engineering.

[19]  Nick F. Ramsey,et al.  Decoding spoken phonemes from sensorimotor cortex with high-density ECoG grids , 2017, NeuroImage.

[20]  Robert D Flint,et al.  Direct classification of all American English phonemes using signals from functional speech motor cortex , 2014, Journal of neural engineering.

[21]  Itzhak Fried,et al.  Degradation of Neuronal Encoding of Speech in the Subthalamic Nucleus in Parkinson's Disease , 2019, Neurosurgery.

[22]  Brian N. Pasley,et al.  Decoding spectrotemporal features of overt and covert speech from the human cortex , 2014, Front. Neuroeng..

[23]  Dean J. Krusienski,et al.  Progress in speech decoding from the electrocorticogram , 2015 .

[24]  Eran Stark,et al.  Predicting Movement from Multiunit Activity , 2007, The Journal of Neuroscience.

[25]  E. B. Newman,et al.  A Scale for the Measurement of the Psychological Magnitude Pitch , 1937 .

[26]  Frank H Guenther,et al.  The DIVA model: A neural theory of speech acquisition and production , 2011, Language and cognitive processes.

[27]  M A Mines,et al.  Frequency of Occurrence of Phonemes in Conversational English , 1978, Language and speech.

[28]  José del R. Millán,et al.  Decoding Inner Speech Using Electrocorticography: Progress and Challenges Toward a Speech Prosthesis , 2018, Front. Neurosci..

[29]  Francis R. Willett,et al.  Speech-related dorsal motor cortex activity does not interfere with iBCI cursor control , 2020, Journal of neural engineering.

[30]  Wilson Truccolo,et al.  Decoding speech from spike-based neural population recordings in secondary auditory cortex of non-human primates , 2019, Communications Biology.

[31]  Joseph G. Makin,et al.  Machine translation of cortical activity to text with an encoder-decoder framework , 2019, bioRxiv.

[32]  F. Guenther,et al.  A Wireless Brain-Machine Interface for Real-Time Speech Synthesis , 2009, PloS one.

[33]  Dean J. Krusienski,et al.  Generating Natural, Intelligible Speech From Brain Activity in Motor, Premotor, and Inferior Frontal Cortices , 2019, Front. Neurosci..

[34]  Francis R. Willett,et al.  Restoration of reaching and grasping in a person with tetraplegia through brain-controlled muscle stimulation: a proof-of-concept demonstration , 2017, The Lancet.

[35]  Krishna V. Shenoy,et al.  Accurate Estimation of Neural Population Dynamics without Spike Sorting , 2019, Neuron.

[36]  Bradley Greger,et al.  Decoding spoken words using local field potentials recorded from the cortical surface , 2010, Journal of neural engineering.

[37]  F. Guenther,et al.  Classification of Intended Phoneme Production from Chronic Intracortical Microelectrode Recordings in Speech-Motor Cortex , 2011, Front. Neurosci..

[38]  Matthew K. Leonard,et al.  The Control of Vocal Pitch in Human Laryngeal Motor Cortex , 2018, Cell.

[39]  Vikash Gilja,et al.  ECoG data analyses to inform closed-loop BCI experiments for speech-based prosthetic applications , 2016, 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[40]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[41]  Francis R. Willett,et al.  Decoding Speech from Intracortical Multielectrode Arrays in Dorsal “Arm/Hand Areas” of Human Motor Cortex , 2018, 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[42]  Steven M Chase,et al.  Intracortical recording stability in human brain–computer interface users , 2018, Journal of neural engineering.

[43]  O. Creutzfeldt,et al.  Neuronal activity in the human lateral temporal lobe , 2004, Experimental Brain Research.

[44]  Marc W Slutzky,et al.  Brain-Machine Interfaces: Powerful Tools for Clinical Treatment and Neuroscientific Investigations , 2019, The Neuroscientist : a review journal bringing neurobiology, neurology and psychiatry.

[45]  Melanie Fried-Oken,et al.  New and emerging access technologies for adults with complex communication needs and severe motor impairments: State of the science , 2019, Augmentative and alternative communication.

[46]  David Blaauw,et al.  A low-power band of neuronal spiking activity dominated by local single units improves the performance of brain–machine interfaces , 2020, Nature Biomedical Engineering.

[47]  Kristofer E. Bouchard,et al.  Neural decoding of spoken vowels from human sensory-motor cortex with high-density electrocorticography , 2014, 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[48]  Karen Livescu,et al.  Differential Representation of Articulatory Gestures and Phonemes in Precentral and Inferior Frontal Gyri , 2018, The Journal of Neuroscience.

[49]  Vinay Jayaram,et al.  Speech-specific tuning of neurons in human superior temporal gyrus. , 2014, Cerebral cortex.

[50]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[51]  Krishna V Shenoy,et al.  ERAASR: an algorithm for removing electrical stimulation artifacts from multielectrode array recordings , 2017, bioRxiv.

[52]  Nicholas V. Annetta,et al.  Extracting wavelet based neural features from human intracortical recordings for neuroprosthetics applications , 2018, Bioelectronic Medicine.

[53]  Tanja Schultz,et al.  Brain-to-text: decoding spoken phrases from phone representations in the brain , 2015, Front. Neurosci..

[54]  Eran Stark,et al.  Comparison of direction and object selectivity of local field potentials and single units in macaque posterior parietal cortex during prehension. , 2007, Journal of neurophysiology.

[55]  David Sussillo,et al.  Making brain–machine interfaces robust to future neural variability , 2016, Nature communications.

[56]  G. Hickok Computational neuroanatomy of speech production , 2012, Nature Reviews Neuroscience.

[57]  Shy Shoham,et al.  Structured neuronal encoding and decoding of human speech features , 2012, Nature Communications.

[58]  Tanja Schultz,et al.  Automatic Speech Recognition from Neural Signals: A Focused Review , 2016, Front. Neurosci..

[59]  Vikash Gilja,et al.  Decoding speech using the timing of neural signal modulation , 2016, 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[60]  Keith Johnson,et al.  Encoding of Articulatory Kinematic Trajectories in Human Speech Sensorimotor Cortex , 2018, Neuron.

[61]  Matthew T. Kaufman,et al.  The Largest Response Component in the Motor Cortex Reflects Movement Timing but Not Movement Type , 2016, eNeuro.

[62]  Steven Brown,et al.  Representation of the speech effectors in the human motor cortex: Somatotopy or overlap? , 2010, Brain and Language.

[63]  Francis R. Willett,et al.  High performance communication by people with paralysis using an intracortical brain-computer interface , 2017, eLife.

[64]  Vikash Gilja,et al.  Long-term Stability of Neural Prosthetic Control Signals from Silicon Cortical Arrays in Rhesus Macaque Motor Cortex , 2010 .

[65]  Tom Chau,et al.  A Review of Emerging Access Technologies for Individuals With Severe Motor Impairments , 2008, Assistive technology : the official journal of RESNA.

[66]  Alan W. Black,et al.  Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[67]  E. Keefer,et al.  Human motor decoding from neural signals: a review , 2019, BMC biomedical engineering.

[68]  Francis R. Willett,et al.  Neural Representation of Observed, Imagined, and Attempted Grasping Force in Motor Cortex of Individuals with Chronic Tetraplegia , 2020, Scientific Reports.

[69]  Julie A Fiez,et al.  Behavioral / Cognitive SUBTHALAMIC NUCLEUS NEURONS DIFFERENTIALLY ENCODE EARLY AND LATE ASPECTS OF SPEECH PRODUCTION , 2018 .

[70]  Jon A. Mukand,et al.  Neuronal ensemble control of prosthetic devices by a human with tetraplegia , 2006, Nature.

[71]  P Suppes,et al.  Brain wave recognition of words. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[72]  Kristofer E. Bouchard,et al.  Functional Organization of Human Sensorimotor Cortex for Speech Articulation , 2013, Nature.

[73]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[74]  Edward F Chang,et al.  The auditory representation of speech sounds in human motor cortex , 2016, eLife.

[75]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[76]  Edward F. Chang,et al.  Speech synthesis from neural decoding of spoken sentences , 2019, Nature.

[77]  Nathan E. Crone,et al.  The Potential for a Speech Brain–Computer Interface Using Chronic Electrocorticography , 2019, Neurotherapeutics.

[78]  Chethan Pandarinath,et al.  Rapid calibration of an intracortical brain–computer interface for people with tetraplegia , 2018, Journal of neural engineering.

[79]  Alexander Kraskov,et al.  Influence of spiking activity on cortical local field potentials , 2013, The Journal of physiology.

[80]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[81]  Surya Ganguli,et al.  Accurate Estimation of Neural Population Dynamics without Spike Sorting , 2017, Neuron.

[82]  Shaomin Zhang,et al.  Observation and assessment of acoustic contamination of electrophysiological brain signals during speech production and sound perception. , 2020, Journal of neural engineering.

[83]  Chethan Pandarinath,et al.  Feasibility of Automatic Error Detect-and-Undo System in Human Intracortical Brain–Computer Interfaces , 2018, IEEE Transactions on Biomedical Engineering.

[84]  Edward F Chang,et al.  Toward a Speech Neuroprosthesis. , 2019, JAMA.

[85]  Debadatta Dash,et al.  Decoding Imagined and Spoken Phrases From Non-invasive Neural (MEG) Signals , 2020, Frontiers in Neuroscience.

[86]  A. Schwartz,et al.  High-performance neuroprosthetic control by an individual with tetraplegia , 2013, The Lancet.

[87]  Kristofer E. Bouchard,et al.  Deep learning as a tool for neural data analysis: Speech classification and cross-frequency coupling in human sensorimotor cortex , 2018, PLoS Comput. Biol..

[88]  O. Creutzfeldt,et al.  Neuronal activity in the human lateral temporal lobe , 1989, Experimental Brain Research.

[89]  Vikash Gilja,et al.  Comparison of spike sorting and thresholding of voltage waveforms for intracortical brain–machine interface performance , 2015, Journal of neural engineering.

[90]  C. Koch,et al.  The origin of extracellular fields and currents — EEG, ECoG, LFP and spikes , 2012, Nature Reviews Neuroscience.

[91]  Bahar Khalighinejad,et al.  Towards reconstructing intelligible speech from the human auditory cortex , 2019, Scientific Reports.

[92]  Michael L. Boninger,et al.  Implicit Grasp Force Representation in Human Motor Cortical Recordings , 2018, Front. Neurosci..

[93]  Gerhard Friehs,et al.  Intra-day signal instabilities affect decoding performance in an intracortical neural interface system , 2013, Journal of neural engineering.

[94]  Nicholas V. Annetta,et al.  Restoring cortical control of functional movement in a human with quadriplegia , 2016, Nature.

[95]  Nicolas Y. Masse,et al.  Virtual typing by people with tetraplegia using a self-calibrating intracortical brain-computer interface , 2015, Science Translational Medicine.

[96]  Sagi Perel,et al.  Extracellular voltage threshold settings can be tuned for optimal encoding of movement and stimulus parameters , 2016, Journal of neural engineering.