An LSTM Based Architecture to Relate Speech Stimulus to Eeg

Modeling the relationship between natural speech and a recorded electroencephalogram (EEG) helps us understand how the brain processes speech and has various applications in neuroscience and brain-computer interfaces. In this context, so far mainly linear models have been used. However, the decoding performance of the linear model is limited due to the complex and highly non-linear nature of the auditory processing in the human brain. We present a novel Long Short-Term Memory (LSTM)-based architecture as a nonlinear model for the classification problem of whether a given pair of (EEG, speech envelope) correspond to each other or not. The model maps short segments of the EEG and the envelope to a common embedding space using a CNN in the EEG path and an LSTM in the speech path. The latter also compensates for the brain response delay. In addition, we use transfer learning to fine-tune the model for each subject. The mean classification accuracy of the proposed model reaches 85%, which is significantly higher than that of a state of the art Convolutional Neural Network (CNN)-based model (73%) and the linear model (69%).

[1]  Jonathan Z. Simon,et al.  Adaptive Temporal Encoding Leads to a Background-Insensitive Cortical Representation of Speech , 2013, The Journal of Neuroscience.

[2]  Nima Mesgarani,et al.  Comparison of Two-Talker Attention Decoding from EEG with Nonlinear Neural Networks and Linear Methods , 2018 .

[3]  Alexander Bertrand,et al.  Auditory-Inspired Speech Envelope Extraction Methods for Improved EEG-Based Auditory Attention Detection in a Cocktail Party Scenario , 2017, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[4]  D. Lesenfants,et al.  Predicting individual speech intelligibility from the cortical tracking of acoustic- and phonetic-level speech representations , 2019, Hearing Research.

[5]  Torsten Dau,et al.  Noise-robust cortical tracking of attended speech in real-world acoustic scenes , 2017, NeuroImage.

[6]  E Ahissar,et al.  Speech comprehension is correlated with temporal response patterns recorded from auditory cortex , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Jan Wouters,et al.  Speech Intelligibility Predicted from Neural Entrainment of the Speech Envelope , 2018, bioRxiv.

[8]  T. Picton,et al.  Human Cortical Responses to the Speech Envelope , 2008, Ear and hearing.

[9]  Edmund C. Lalor,et al.  The Multivariate Temporal Response Function (mTRF) Toolbox: A MATLAB Toolbox for Relating Neural Signals to Continuous Stimuli , 2016, Front. Hum. Neurosci..

[10]  John J. Foxe,et al.  Attentional Selection in a Cocktail Party Environment Can Be Decoded from Single-Trial EEG. , 2015, Cerebral cortex.

[11]  J. Simon,et al.  Emergence of neural encoding of auditory objects while listening to competing speakers , 2012, Proceedings of the National Academy of Sciences.

[12]  Edmund C. Lalor,et al.  Low-Frequency Cortical Entrainment to Speech Reflects Phoneme-Level Processing , 2015, Current Biology.

[13]  Alexander Bertrand,et al.  EEG-based detection of the attended speaker and the locus of auditory attention with convolutional neural networks , 2018, bioRxiv.

[14]  Heleen Luts,et al.  Development and normative data for the Flemish/Dutch Matrix test , 2014 .

[15]  Alexander Bertrand,et al.  A generic EEG artifact removal algorithm based on the multi-channel Wiener filter , 2018, Journal of neural engineering.

[16]  Maarten De Vos,et al.  Decoding the attended speech stream with multi-channel EEG: implications for online, daily-life applications , 2015, Journal of neural engineering.

[17]  Jan Wouters,et al.  APEX 3: a multi-purpose test platform for auditory psychophysical experiments , 2008, Journal of Neuroscience Methods.

[18]  Alain de Cheveigné,et al.  Decoding the auditory brain with canonical component analysis , 2017, NeuroImage.

[19]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[20]  Birger Kollmeier,et al.  Machine learning for decoding listeners’ attention from electroencephalography evoked by continuous speech , 2020, The European journal of neuroscience.