Exploiting Electrophysiological Measures of Semantic Processing for Auditory Attention Decoding

In Auditory Attention Decoding, a user’s electrophysiological brain responses to certain features of speech are modelled and subsequently used to distinguish attended from unattended speech in multi-speaker contexts. Such approaches are frequently based on acoustic features of speech, such as the auditory envelope. A recent paper shows that the brain’s response to a semantic description (i.e., semantic dissimilarity) of narrative speech can also be modelled using such an approach. Here we use the (publicly available) data accompanying that study, in order to investigate whether combining this semantic dissimilarity feature with an auditory envelope approach improves decoding performance over using the envelope alone. We analyse data from their ‘Cocktail Party’ experiment in which 33 subjects attended to one of two simultaneously presented audiobook narrations, for 30 1-minute fragments. We find that the addition of the dissimilarity feature to an envelope-based approach significantly increases accuracy, though the increase is marginal (85.4% to 86.6%). However, we subsequently show that this dissimilarity feature, in which the degree of dissimilarity of the current word with regard to the previous context is tagged to the onsets of each content word, can be replaced with a binary content-word-onset feature, without significantly affecting the results (i.e., modelled responses or accuracy), putting in question the added value of the dissimilarity information for the approach introduced in this recent paper.

[1]  N. Mesgarani,et al.  Selective cortical representation of attended speaker in multi-talker speech perception , 2012, Nature.

[2]  D. Poeppel,et al.  Mechanisms Underlying Selective Neuronal Tracking of Attended Speech at a “Cocktail Party” , 2013, Neuron.

[3]  A. Szentkuti,et al.  Differences in brain potentials to open and closed class words: class and frequency effects , 2001, Neuropsychologia.

[4]  Maarten De Vos,et al.  Decoding the attended speech stream with multi-channel EEG: implications for online, daily-life applications , 2015, Journal of neural engineering.

[5]  Edmund C. Lalor,et al.  Electrophysiological Correlates of Semantic Dissimilarity Reflect the Comprehension of Natural, Narrative Speech , 2017, Current Biology.

[6]  Stefan Haufe,et al.  On the interpretation of weight vectors of linear models in multivariate neuroimaging , 2014, NeuroImage.

[7]  C. Van Petten,et al.  Words and sentences: event-related brain potential measures. , 1995, Psychophysiology.

[8]  Eugene S. Edgington,et al.  Randomization Tests , 2011, International Encyclopedia of Statistical Science.

[9]  S. Luck,et al.  How inappropriate high-pass filters can produce artifactual effects and incorrect conclusions in ERP studies of language and cognition. , 2015, Psychophysiology.

[10]  Malcolm Slaney,et al.  A Comparison of Regularization Methods in Forward and Backward Models for Auditory Attention Decoding , 2018, Front. Neurosci..

[11]  Robert Oostenveld,et al.  FieldTrip: Open Source Software for Advanced Analysis of MEG, EEG, and Invasive Electrophysiological Data , 2010, Comput. Intell. Neurosci..

[12]  G. Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Permutation P -values Should Never Be Zero: Calculating Exact P -values When Permutations Are Randomly Drawn , 2011 .

[13]  David Poeppel,et al.  The Tracking of Speech Envelope in the Human Cortex , 2013, PloS one.

[14]  F. Perrin,et al.  Spherical splines for scalp potential and current density mapping. , 1989, Electroencephalography and clinical neurophysiology.

[15]  G. Gratton Dealing with artifacts: The EOG contamination of the event-related brain potential , 1998 .

[16]  John J. Foxe,et al.  Attentional Selection in a Cocktail Party Environment Can Be Decoded from Single-Trial EEG. , 2015, Cerebral cortex.

[17]  J. Simon,et al.  Emergence of neural encoding of auditory objects while listening to competing speakers , 2012, Proceedings of the National Academy of Sciences.

[18]  Georgiana Dinu,et al.  Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors , 2014, ACL.

[19]  E. C. Cmm,et al.  on the Recognition of Speech, with , 2008 .

[20]  Eric P. Xing,et al.  Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2014, ACL 2014.

[21]  Edmund C. Lalor,et al.  The Multivariate Temporal Response Function (mTRF) Toolbox: A MATLAB Toolbox for Relating Neural Signals to Continuous Stimuli , 2016, Front. Hum. Neurosci..

[22]  Kara D. Federmeier,et al.  Thirty years and counting: finding meaning in the N400 component of the event-related brain potential (ERP). , 2011, Annual review of psychology.

[23]  G. Schalk,et al.  Identifying the Attended Speaker Using Electrocorticographic (ECoG) Signals. , 2015, Brain computer interfaces.

[24]  Josh H. McDermott The cocktail party problem , 2009, Current Biology.

[25]  Torsten Dau,et al.  Noise-robust cortical tracking of attended speech in real-world acoustic scenes , 2017, NeuroImage.

[26]  R. Oostenveld,et al.  Nonparametric statistical testing of EEG- and MEG-data , 2007, Journal of Neuroscience Methods.

[27]  John J. Foxe,et al.  At what time is the cocktail party? A late locus of selective attention to natural speech , 2012, The European journal of neuroscience.

[28]  Zhuo Chen,et al.  Neural decoding of attentional selection in multi-speaker environments without access to clean sources , 2017, Journal of neural engineering.