State-of-the-art Speech Recognition using EEG and Towards Decoding of Speech Spectrum From EEG

In this paper we first demonstrate continuous noisy speech recognition using electroencephalography (EEG) signals on English vocabulary using different types of state of the art end-to-end automatic speech recognition (ASR) models, we further provide results obtained using EEG data recorded under different experimental conditions. We finally demonstrate decoding of speech spectrum from EEG signals using a long short term memory (LSTM) based regression model and Generative Adversarial Network (GAN) based model. Our results demonstrate the feasibility of using EEG signals for continuous noisy speech recognition under different experimental conditions and we provide preliminary results for synthesis of speech from EEG features.

[1]  Arnaud Delorme,et al.  EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis , 2004, Journal of Neuroscience Methods.

[2]  Nick F. Ramsey,et al.  Decoding spoken phonemes from sensorimotor cortex with high-density ECoG grids , 2017, NeuroImage.

[3]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[4]  Edward F. Chang,et al.  Speech synthesis from neural decoding of spoken sentences , 2019, Nature.

[5]  Navdeep Jaitly,et al.  Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.

[6]  Frank Rudzicz,et al.  Classifying phonological categories in imagined and articulated speech , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[8]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[9]  Gunnar Rätsch,et al.  Kernel PCA and De-Noising in Feature Spaces , 1998, NIPS.

[10]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[11]  Nima Mesgarani,et al.  Speech processing with a cortical representation of audio , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Yoshua Bengio,et al.  Attention-Based Models for Speech Recognition , 2015, NIPS.

[13]  Kuansan Wang,et al.  Auditory representations of acoustic signals , 1992, IEEE Trans. Inf. Theory.

[14]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[15]  Ahmed Tewfik,et al.  Advancing Speech Recognition With No Speech Or With Noisy Speech , 2019, 2019 27th European Signal Processing Conference (EUSIPCO).

[16]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[17]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[18]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[19]  Alex Graves,et al.  Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.

[20]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  Ahmed H. Tewfik,et al.  Speech Recognition with No Speech or with Noisy Speech , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[23]  Shrikanth Narayanan,et al.  Real-time magnetic resonance imaging and electromagnetic articulography database for speech production research (TC). , 2014, The Journal of the Acoustical Society of America.