Towards direct speech synthesis from ECoG: A pilot study

Most current Brain-Computer Interfaces (BCIs) achieve high information transfer rates using spelling paradigms based on stimulus-evoked potentials. Despite the success of this interfaces, this mode of communication can be cumbersome and unnatural. Direct synthesis of speech from neural activity represents a more natural mode of communication that would enable users to convey verbal messages in real-time. In this pilot study with one participant, we demonstrate that electrocoticography (ECoG) intracranial activity from temporal areas can be used to resynthesize speech in real-time. This is accomplished by reconstructing the audio magnitude spectrogram from neural activity and subsequently creating the audio waveform from these reconstructed spectrograms. We show that significant correlations between the original and reconstructed spectrograms and temporal waveforms can be achieved. While this pilot study uses audibly spoken speech for the models, it represents a first step towards speech synthesis from speech imagery.

[1]  1993 International Joint Conference on AI , 1994, IEEE Expert.

[2]  Kristofer E. Bouchard,et al.  Neural decoding of spoken vowels from human sensory-motor cortex with high-density electrocorticography , 2014, 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[3]  IEEE Recommended Practice for Speech Quality Measurements , 1969, IEEE Transactions on Audio and Electroacoustics.

[4]  Edward F Chang,et al.  The auditory representation of speech sounds in human motor cortex , 2016, eLife.

[5]  Tanja Schultz,et al.  Continuous speech recognition from ECoG , 2015, INTERSPEECH.

[6]  Gerwin Schalk,et al.  NeuralAct: A Tool to Visualize Electrocortical (ECoG) Activity on a Three-Dimensional Model of the Cortex , 2015, Neuroinformatics.

[7]  Tanja Schultz,et al.  Brain-to-text: decoding spoken phrases from phone representations in the brain , 2015, Front. Neurosci..

[8]  Tanja Schultz,et al.  Direct conversion from facial myoelectric signals to speech using Deep Neural Networks , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[9]  Robert D Flint,et al.  Direct classification of all American English phonemes using signals from functional speech motor cortex , 2014, Journal of neural engineering.

[10]  Brian N. Pasley,et al.  Decoding spectrotemporal features of overt and covert speech from the human cortex , 2014, Front. Neuroeng..

[11]  Kristofer E. Bouchard,et al.  Functional Organization of Human Sensorimotor Cortex for Speech Articulation , 2013, Nature.

[12]  Jae Lim,et al.  Signal estimation from modified short-time Fourier transform , 1984 .

[13]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[14]  Reinhold Scherer,et al.  Steady-state visual evoked potential (SSVEP)-based communication: impact of harmonic frequency components , 2005, Journal of neural engineering.

[15]  Gerwin Schalk,et al.  Temporal evolution of gamma activity in human cortex during an overt and covert word repetition task , 2012, Front. Hum. Neurosci..

[16]  Rajesh P. N. Rao,et al.  Spectral Changes in Cortical Surface Potentials during Motor Movement , 2007, The Journal of Neuroscience.

[17]  N. Birbaumer,et al.  BCI2000: a general-purpose brain-computer interface (BCI) system , 2004, IEEE Transactions on Biomedical Engineering.

[18]  Stefan Haufe,et al.  On the interpretation of weight vectors of linear models in multivariate neuroimaging , 2014, NeuroImage.

[19]  Marc W. Slutzky,et al.  Cortical encoding of phonemic context during word production , 2014, 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[20]  E Donchin,et al.  The mental prosthesis: assessing the speed of a P300-based brain-computer interface. , 2000, IEEE transactions on rehabilitation engineering : a publication of the IEEE Engineering in Medicine and Biology Society.

[21]  Brian N. Pasley,et al.  Reconstructing Speech from Human Auditory Cortex , 2012, PLoS biology.