论文信息 - An LSTM Based Architecture to Relate Speech Stimulus to Eeg - 字舞流文

An LSTM Based Architecture to Relate Speech Stimulus to Eeg

Modeling the relationship between natural speech and a recorded electroencephalogram (EEG) helps us understand how the brain processes speech and has various applications in neuroscience and brain-computer interfaces. In this context, so far mainly linear models have been used. However, the decoding performance of the linear model is limited due to the complex and highly non-linear nature of the auditory processing in the human brain. We present a novel Long Short-Term Memory (LSTM)-based architecture as a nonlinear model for the classification problem of whether a given pair of (EEG, speech envelope) correspond to each other or not. The model maps short segments of the EEG and the envelope to a common embedding space using a CNN in the EEG path and an LSTM in the speech path. The latter also compensates for the brain response delay. In addition, we use transfer learning to fine-tune the model for each subject. The mean classification accuracy of the proposed model reaches 85%, which is significantly higher than that of a state of the art Convolutional Neural Network (CNN)-based model (73%) and the linear model (69%).

Hugo Van hamme | Tom Francart | Mohammad Jalilpour-Monesi | Bernd Accou | Jair Montoya-Martínez | T. Francart | Jair Montoya-Martínez | H. V. hamme | Bernd Accou | Mohammad Jalilpour-Monesi

[1] Jonathan Z. Simon,et al. Adaptive Temporal Encoding Leads to a Background-Insensitive Cortical Representation of Speech , 2013, The Journal of Neuroscience.

[2] Nima Mesgarani,et al. Comparison of Two-Talker Attention Decoding from EEG with Nonlinear Neural Networks and Linear Methods , 2018 .

[3] Alexander Bertrand,et al. Auditory-Inspired Speech Envelope Extraction Methods for Improved EEG-Based Auditory Attention Detection in a Cocktail Party Scenario , 2017, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[4] D. Lesenfants,et al. Predicting individual speech intelligibility from the cortical tracking of acoustic- and phonetic-level speech representations , 2019, Hearing Research.

[5] Torsten Dau,et al. Noise-robust cortical tracking of attended speech in real-world acoustic scenes , 2017, NeuroImage.

[6] E Ahissar,et al. Speech comprehension is correlated with temporal response patterns recorded from auditory cortex , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[7] Jan Wouters,et al. Speech Intelligibility Predicted from Neural Entrainment of the Speech Envelope , 2018, bioRxiv.

[8] T. Picton,et al. Human Cortical Responses to the Speech Envelope , 2008, Ear and hearing.

[9] Edmund C. Lalor,et al. The Multivariate Temporal Response Function (mTRF) Toolbox: A MATLAB Toolbox for Relating Neural Signals to Continuous Stimuli , 2016, Front. Hum. Neurosci..

[10] John J. Foxe,et al. Attentional Selection in a Cocktail Party Environment Can Be Decoded from Single-Trial EEG. , 2015, Cerebral cortex.

[11] J. Simon,et al. Emergence of neural encoding of auditory objects while listening to competing speakers , 2012, Proceedings of the National Academy of Sciences.

[12] Edmund C. Lalor,et al. Low-Frequency Cortical Entrainment to Speech Reflects Phoneme-Level Processing , 2015, Current Biology.

[13] Alexander Bertrand,et al. EEG-based detection of the attended speaker and the locus of auditory attention with convolutional neural networks , 2018, bioRxiv.

[14] Heleen Luts,et al. Development and normative data for the Flemish/Dutch Matrix test , 2014 .

[15] Alexander Bertrand,et al. A generic EEG artifact removal algorithm based on the multi-channel Wiener filter , 2018, Journal of neural engineering.

[16] Maarten De Vos,et al. Decoding the attended speech stream with multi-channel EEG: implications for online, daily-life applications , 2015, Journal of neural engineering.

[17] Jan Wouters,et al. APEX 3: a multi-purpose test platform for auditory psychophysical experiments , 2008, Journal of Neuroscience Methods.

[18] Alain de Cheveigné,et al. Decoding the auditory brain with canonical component analysis , 2017, NeuroImage.

[19] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[20] Birger Kollmeier,et al. Machine learning for decoding listeners’ attention from electroencephalography evoked by continuous speech , 2020, The European journal of neuroscience.