Native Language and Stimuli Signal Prediction from EEG

Understanding the neural processing of natural speech processing is an important first step for designing Brain-Computer Interface (BCI) based speech enhancement and speech recognition systems. Complex neural signals like electroencephalography (EEG) are time-varying and has a non-linear relationship with continuous speech. Linear models can decode stimulus features reliably, but the correlation between the reconstructed signal and continuous EEG remain low despite attempts at optimization. In the current application, we demonstrate the utility of a Recurrent Neural Networks (RNN) model to relate various stimuli features such as the envelope, spectrogram to the continuous EEG in a cocktail party scenario. We use a Long Short-Term Memory (LSTM) neural network architecture that has self-connecting loops which help in preserving past information to predict future value. Given that predictability plays a critical role in speech comprehension, we posit that such a neural network architecture yield better results. In attended condition, for native participants, the LSTM models yield 30% and 22% mean correlation improvement and for non-native participants, 43% and 37% improvement over linear models for envelope and spectrogram respectively with EEG. Finally, we have trained a single model to predict the native language of a participant using EEG and it yielded 95% accuracy.

[1]  Birger Kollmeier,et al.  Machine learning for decoding listeners’ attention from electroencephalography evoked by continuous speech , 2020, The European journal of neuroscience.

[2]  Edmund C. Lalor,et al.  The Multivariate Temporal Response Function (mTRF) Toolbox: A MATLAB Toolbox for Relating Neural Signals to Continuous Stimuli , 2016, Front. Hum. Neurosci..

[3]  C. Willmott,et al.  Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance , 2005 .

[4]  Alexander Bertrand,et al.  Auditory-Inspired Speech Envelope Extraction Methods for Improved EEG-Based Auditory Attention Detection in a Cocktail Party Scenario , 2017, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[5]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[6]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[7]  Zhuo Chen,et al.  Neural decoding of attentional selection in multi-speaker environments without access to clean sources , 2017, Journal of neural engineering.

[8]  Klaus-Robert Müller,et al.  Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[9]  R V Shannon,et al.  Speech Recognition with Primarily Temporal Cues , 1995, Science.

[10]  Maarten De Vos,et al.  Auditory attention decoding with EEG recordings using noisy acoustic reference signals , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Simon Van Eyndhoven,et al.  EEG-based attention-driven speech enhancement for noisy speech mixtures using N-fold multi-channel Wiener filters , 2017, 2017 25th European Signal Processing Conference (EUSIPCO).

[12]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[13]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14]  Smith,et al.  Mathematics of the Discrete Fourier Transform (DFT) with Audio Applications , 2007 .

[15]  Alexander Bertrand,et al.  EEG-Informed Attended Speaker Extraction From Recorded Speech Mixtures With Application in Neuro-Steered Hearing Prostheses , 2016, IEEE Transactions on Biomedical Engineering.