论文信息 - Continuous speech recognition from ECoG

Continuous speech recognition from ECoG

Continuous speech production is a highly complex process involving many parts of the human brain. To date, no fundamental representation that allows for decoding of continuous speech from neural signals has been presented. Here we show that techniques from automatic speech recognition can be applied to decode a textual representation of spoken words from neural signals. We model phones as the fundamental unit of the speech process in invasively measured brain activity (intracranial electrocorticographic (ECoG)) recordings. These phone models give insights into timings and locations of neural processes associated with the continuous production of speech and can be used in a speech recognizer to decode the neural data into their textual representations. When restricting the dictionary to small subsets, Word Error Rates as low as 25% can be achieved. As the brain activity data sets are fairly small, alternative approaches to Gaussian models are investigated by relying on robust, regularized discriminative models.

[1] Gerwin Schalk,et al. NeuralAct: A Tool to Visualize Electrocortical (ECoG) Activity on a Three-Dimensional Model of the Cortex , 2015, Neuroinformatics.

[2] Rajesh P. N. Rao,et al. Localization and classification of phonemes using high spatial resolution electrocorticography (ECoG) grids , 2008, 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[3] Eric Halgren,et al. Sequential Processing of Lexical, Grammatical, and Phonological Information Within Broca’s Area , 2009, Science.

[4] F. Guenther,et al. Classification of Intended Phoneme Production from Chronic Intracortical Microelectrode Recordings in Speech-Motor Cortex , 2011, Front. Neurosci..

[5] Friedemann Pulvermüller,et al. Motor cortex maps articulatory features of speech sounds , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[6] Tanja Schultz,et al. Brain-to-text: decoding spoken phrases from phone representations in the brain , 2015, Front. Neurosci..

[7] Keith Johnson,et al. Phonetic Feature Encoding in Human Superior Temporal Gyrus , 2014, Science.

[8] E. Chang,et al. Categorical Speech Representation in Human Superior Temporal Gyrus , 2010, Nature Neuroscience.

[9] Bradley Greger,et al. Decoding spoken words using local field potentials recorded from the cortical surface , 2010, Journal of neural engineering.

[10] K. Müller,et al. Finding stationary subspaces in multivariate time series. , 2009, Physical review letters.

[11] Nicholas P. Szrama,et al. Using the electrocorticographic speech network to control a brain–computer interface in humans , 2011, Journal of neural engineering.

[12] N. Birbaumer,et al. BCI2000: a general-purpose brain-computer interface (BCI) system , 2004, IEEE Transactions on Biomedical Engineering.

[13] Roy P. Basler,et al. The Collected Works of Abraham Lincoln. , 1953 .

[14] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[15] Ngoc Thang Vu,et al. BioKIT - real-time decoder for biosignal processing , 2014, INTERSPEECH.

[16] Dominic Heger,et al. Joint optimization for discriminative, compact and robust Brain-Computer Interfacing , 2015, 2015 7th International IEEE/EMBS Conference on Neural Engineering (NER).

[17] Marc W. Slutzky,et al. Cortical encoding of phonemic context during word production , 2014, 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[18] Cuntai Guan,et al. Electrocorticographic representations of segmental features in continuous speech , 2015, Front. Hum. Neurosci..

[19] Robert D Flint,et al. Direct classification of all American English phonemes using signals from functional speech motor cortex , 2014, Journal of neural engineering.

[20] J. Talairach,et al. Co-Planar Stereotaxic Atlas of the Human Brain: 3-Dimensional Proportional System: An Approach to Cerebral Imaging , 1988 .

[21] R. Irizarry,et al. Electrocorticographic gamma activity during word production in spoken and sign language , 2001, Neurology.

[22] B. Gordon,et al. Induced electrocorticographic gamma activity during auditory perception , 2001, Clinical Neurophysiology.

[23] G. Schalk,et al. Decoding vowels and consonants in spoken and imagined words using electrocorticographic signals in humans , 2011, Journal of neural engineering.

[24] John Vance Cheney,et al. Inaugural addresses of the Presidents of the United States from Washington to Polk , 2022 .

[25] F. Guenther,et al. A Wireless Brain-Machine Interface for Real-Time Speech Synthesis , 2009, PloS one.

[26] Rainer Goebel,et al. "Who" Is Saying "What"? Brain-Based Decoding of Human Voice and Speech , 2008, Science.

[27] Kristofer E. Bouchard,et al. Functional Organization of Human Sensorimotor Cortex for Speech Articulation , 2013, Nature.