EEG-Informed Attended Speaker Extraction From Recorded Speech Mixtures With Application in Neuro-Steered Hearing Prostheses

Objective: We aim to extract and denoise the attended speaker in a noisy two-speaker acoustic scenario, relying on microphone array recordings from a binaural hearing aid, which are complemented with electroencephalography (EEG) recordings to infer the speaker of interest. Methods: In this study, we propose a modular processing flow that first extracts the two speech envelopes from the microphone recordings, then selects the attended speech envelope based on the EEG, and finally uses this envelope to inform a multichannel speech separation and denoising algorithm. Results: Strong suppression of interfering (unattended) speech and background noise is achieved, while the attended speech is preserved. Furthermore, EEG-based auditory attention detection (AAD) is shown to be robust to the use of noisy speech signals. Conclusions: Our results show that AAD-based speaker extraction from microphone array recordings is feasible and robust, even in noisy acoustic environments, and without access to the clean speech signals to perform EEG-based AAD. Significance: Current research on AAD always assumes the availability of the clean speech signals, which limits the applicability in real settings. We have extended this research to detect the attended speaker even when only microphone recordings with noisy speech mixtures are available. This is an enabling ingredient for new brain–computer interfaces and effective filtering schemes in neuro-steered hearing prostheses. Here, we provide a first proof of concept for EEG-informed attended speaker extraction and denoising.

[1]  James J. S. Norton,et al.  Soft, curved electrode systems capable of integration on the auricle as a persistent brain–computer interface , 2015, Proceedings of the National Academy of Sciences.

[2]  Alexander Bertrand,et al.  Auditory-Inspired Speech Envelope Extraction Methods for Improved EEG-Based Auditory Attention Detection in a Cocktail Party Scenario , 2017, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[3]  Alexander Bertrand,et al.  Distributed Signal Processing for Wireless EEG Sensor Networks , 2015, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[4]  N. Mesgarani,et al.  Selective cortical representation of attended speaker in multi-talker speech perception , 2012, Nature.

[5]  Sergios Theodoridis,et al.  Distributed robust labeling of audio sources in heterogeneous wireless sensor networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  T. Picton,et al.  Human Cortical Responses to the Speech Envelope , 2008, Ear and hearing.

[7]  J. Simon,et al.  Neural coding of continuous speech in auditory cortex during monaural and dichotic listening. , 2012, Journal of neurophysiology.

[8]  Martin G Bleichner,et al.  Exploring miniaturized EEG electrodes for brain-computer interfaces. An EEG you do not see? , 2015, Physiological reports.

[9]  Maarten De Vos,et al.  Auditory attention decoding with EEG recordings using noisy acoustic reference signals , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  D. Poeppel,et al.  Mechanisms Underlying Selective Neuronal Tracking of Attended Speech at a “Cocktail Party” , 2013, Neuron.

[11]  Julien Penders,et al.  Wearable, Wireless EEG Solutions in Daily Life Applications: What are we Missing? , 2015, IEEE Journal of Biomedical and Health Informatics.

[12]  Antoine J. Shahin,et al.  Attentional Gain Control of Ongoing Cortical Speech Representations in a “Cocktail Party” , 2010, The Journal of Neuroscience.

[13]  Brian N. Pasley,et al.  Reconstructing Speech from Human Auditory Cortex , 2012, PLoS biology.

[14]  Alessandro Presacco,et al.  Robust decoding of selective auditory attention from MEG in a competing-speaker environment via state-space modeling , 2016, NeuroImage.

[15]  D. P. Mandic,et al.  The In-the-Ear Recording Concept: User-Centered and Wearable Brain Monitoring , 2012, IEEE Pulse.

[16]  Marc Moonen,et al.  Low-rank Approximation Based Multichannel Wiener Filter Algorithms for Noise Reduction with Application in Cochlear Implants , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[17]  Marc Moonen,et al.  Energy-based multi-speaker voice activity detection with an ad hoc microphone array , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[18]  Marc Moonen,et al.  Reduced-Bandwidth and Distributed MWF-Based Noise Reduction Algorithms for Binaural Hearing Aids , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Marc Moonen,et al.  Adaptive distributed noise reduction for speech enhancement in wireless acoustic sensor networks , 2010 .

[20]  Marc Moonen,et al.  Blind separation of non-negative source signals using multiplicative updates and subspace projection , 2010, Signal Process..

[21]  Volker Hohmann,et al.  Database of Multichannel In-Ear and Behind-the-Ear Head-Related and Binaural Room Impulse Responses , 2009, EURASIP J. Adv. Signal Process..

[22]  Esther Rodriguez-Villegas,et al.  Wearable Electroencephalography , 2010, IEEE Engineering in Medicine and Biology Magazine.

[23]  Maarten De Vos,et al.  Decoding the attended speech stream with multi-channel EEG: implications for online, daily-life applications , 2015, Journal of neural engineering.

[24]  John J. Foxe,et al.  Attentional Selection in a Cocktail Party Environment Can Be Decoded from Single-Trial EEG. , 2015, Cerebral cortex.

[25]  J. Simon,et al.  Emergence of neural encoding of auditory objects while listening to competing speakers , 2012, Proceedings of the National Academy of Sciences.

[26]  Marc Moonen,et al.  GSVD-based optimal filtering for single and multimicrophone speech enhancement , 2002, IEEE Trans. Signal Process..