Brain-to-text: decoding spoken phrases from phone representations in the brain

It has long been speculated whether communication between humans and machines based on natural speech related cortical activity is possible. Over the past decade, studies have suggested that it is feasible to recognize isolated aspects of speech from neural signals, such as auditory features, phones or one of a few isolated words. However, until now it remained an unsolved challenge to decode continuously spoken speech from the neural substrate associated with speech and language processing. Here, we show for the first time that continuously spoken speech can be decoded into the expressed words from intracranial electrocorticographic (ECoG) recordings.Specifically, we implemented a system, which we call Brain-To-Text that models single phones, employs techniques from automatic speech recognition (ASR), and thereby transforms brain activity while speaking into the corresponding textual representation. Our results demonstrate that our system can achieve word error rates as low as 25% and phone error rates below 50%. Additionally, our approach contributes to the current understanding of the neural basis of continuous speech production by identifying those cortical regions that hold substantial information about individual phones. In conclusion, the Brain-To-Text system described in this paper represents an important step toward human-machine communication based on imagined speech.

[1]  H. Ney,et al.  Linear discriminant analysis for improved large vocabulary continuous speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Marc W. Slutzky,et al.  Cortical encoding of phonemic context during word production , 2014, 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[3]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[4]  Keith Johnson,et al.  Phonetic Feature Encoding in Human Superior Temporal Gyrus , 2014, Science.

[5]  David Poeppel,et al.  The Tracking of Speech Envelope in the Human Cortex , 2013, PloS one.

[6]  Robert D Flint,et al.  Direct classification of all American English phonemes using signals from functional speech motor cortex , 2014, Journal of neural engineering.

[7]  Jayaram Chandrashekar,et al.  Sequential Processing of Lexical, Grammatical, and Phonological Information Within Broca's Area , 2009 .

[8]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[9]  Dennis J. McFarland,et al.  Brain–computer interfaces for communication and control , 2002, Clinical Neurophysiology.

[10]  Roy P. Basler,et al.  The Collected Works of Abraham Lincoln. , 1953 .

[11]  Alex Acero,et al.  Spoken Language Processing , 2001 .

[12]  Walter Crane Mother Goose's Nursery Rhymes: A Collection of Alphabets, Rhymes, Tales, and Jingles , 2010 .

[13]  Siyi Deng,et al.  EEG classification of imagined syllable rhythm using Hilbert spectrum methods , 2010, Journal of neural engineering.

[14]  G. Pfurtscheller,et al.  Brain-Computer Interfaces for Communication and Control. , 2011, Communications of the ACM.

[15]  Gerwin Schalk,et al.  NeuralAct: A Tool to Visualize Electrocortical (ECoG) Activity on a Three-Dimensional Model of the Cortex , 2015, Neuroinformatics.

[16]  Kristofer E. Bouchard,et al.  Neural decoding of spoken vowels from human sensory-motor cortex with high-density electrocorticography , 2014, 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[17]  Friedemann Pulvermüller,et al.  Motor cortex maps articulatory features of speech sounds , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Gerwin Schalk,et al.  Temporal evolution of gamma activity in human cortex during an overt and covert word repetition task , 2012, Front. Hum. Neurosci..

[19]  N. Barbaro,et al.  Spatiotemporal Dynamics of Word Processing in the Human Brain , 2007, Front. Neurosci..

[20]  Rainer Goebel,et al.  "Who" Is Saying "What"? Brain-Based Decoding of Human Voice and Speech , 2008, Science.

[21]  Eric Halgren,et al.  Sequential Processing of Lexical, Grammatical, and Phonological Information Within Broca’s Area , 2009, Science.

[22]  F. Guenther,et al.  Classification of Intended Phoneme Production from Chronic Intracortical Microelectrode Recordings in Speech-Motor Cortex , 2011, Front. Neurosci..

[23]  John Vance Cheney,et al.  Inaugural addresses of the Presidents of the United States from Washington to Polk , 2022 .

[24]  Erich E. Sutter,et al.  The brain response interface: communication through visually-induced electrical brain responses , 1992 .

[25]  F. Guenther,et al.  A Wireless Brain-Machine Interface for Real-Time Speech Synthesis , 2009, PloS one.

[26]  T. Gasser,et al.  Transformations towards the normal distribution of broad band spectral parameters of the EEG. , 1982, Electroencephalography and clinical neurophysiology.

[27]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[28]  G. B. Varile Multilingual Speech Processing , 2005 .

[29]  Rajesh P. N. Rao,et al.  Localization and classification of phonemes using high spatial resolution electrocorticography (ECoG) grids , 2008, 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[30]  Masaaki Nishida,et al.  Cortical gamma-oscillations modulated by listening and overt repetition of phonemes , 2010, NeuroImage.

[31]  Eric Leuthardt,et al.  Spatiotemporal dynamics of electrocorticographic high gamma activity during overt and covert word repetition , 2011, NeuroImage.

[32]  Kristofer E. Bouchard,et al.  Functional Organization of Human Sensorimotor Cortex for Speech Articulation , 2013, Nature.

[33]  Mark J. F. Gales,et al.  The Application of Hidden Markov Models in Speech Recognition , 2007, Found. Trends Signal Process..

[34]  M. Torrens Co-Planar Stereotaxic Atlas of the Human Brain—3-Dimensional Proportional System: An Approach to Cerebral Imaging, J. Talairach, P. Tournoux. Georg Thieme Verlag, New York (1988), 122 pp., 130 figs. DM 268 , 1990 .

[35]  Kai-Fu Lee,et al.  Context-independent phonetic hidden Markov models for speaker-independent continuous speech recognition , 1990 .

[36]  J. Wolpaw,et al.  Mu and Beta Rhythm Topographies During Motor Imagery and Actual Movements , 2004, Brain Topography.

[37]  Cuntai Guan,et al.  Electrocorticographic representations of segmental features in continuous speech , 2015, Front. Hum. Neurosci..

[38]  Kay-Fu Lee,et al.  Context-dependent phonetic hidden Markov models for speaker-independent continuous speech recognition , 1990, IEEE Trans. Acoust. Speech Signal Process..

[39]  Rajesh P. N. Rao,et al.  Spectral Changes in Cortical Surface Potentials during Motor Movement , 2007, The Journal of Neuroscience.

[40]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[41]  Brian N. Pasley,et al.  Decoding spectrotemporal features of overt and covert speech from the human cortex , 2014, Front. Neuroeng..

[42]  Bradley Greger,et al.  Decoding spoken words using local field potentials recorded from the cortical surface , 2010, Journal of neural engineering.

[43]  B. Gordon,et al.  Induced electrocorticographic gamma activity during auditory perception , 2001, Clinical Neurophysiology.

[44]  Michael H Kohrman,et al.  ECoG gamma activity during a language task: differentiating expressive and receptive speech areas. , 2008, Brain : a journal of neurology.

[45]  E. Donchin,et al.  Talking off the top of your head: toward a mental prosthesis utilizing event-related brain potentials. , 1988, Electroencephalography and clinical neurophysiology.

[46]  R. Irizarry,et al.  Electrocorticographic gamma activity during word production in spoken and sign language , 2001, Neurology.

[47]  YoungSteve,et al.  The application of hidden Markov models in speech recognition , 2007 .

[48]  N. Birbaumer,et al.  BCI2000: a general-purpose brain-computer interface (BCI) system , 2004, IEEE Transactions on Biomedical Engineering.

[49]  Brian N. Pasley,et al.  Reconstructing Speech from Human Auditory Cortex , 2012, PLoS biology.

[50]  E. Chang,et al.  Categorical Speech Representation in Human Superior Temporal Gyrus , 2010, Nature Neuroscience.

[51]  Gerwin Schalk,et al.  Dynamics of electrocorticographic (ECoG) activity in human temporal and frontal cortical areas during music listening , 2012, NeuroImage.

[52]  Nicholas P. Szrama,et al.  Using the electrocorticographic speech network to control a brain–computer interface in humans , 2011, Journal of neural engineering.

[53]  G. Schalk,et al.  Decoding vowels and consonants in spoken and imagined words using electrocorticographic signals in humans , 2011, Journal of neural engineering.

[54]  Ngoc Thang Vu,et al.  BioKIT - real-time decoder for biosignal processing , 2014, INTERSPEECH.

[55]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .