Flexible tracking of auditory attention

Auditory selective attention plays a central role in the human capacity to reliably process complex sounds in multi-source environments. The ability to track the attentional state of individuals in such environments is of interest to neuroscientists and engineers due to its importance in the study of attentionrelated disorders and its potential application in the hearing aid and advertising industries. The underlying neural basis of auditory attention is not well established, however evidence exists to suggest that cortical activity entrainment to the temporal envelope of speech is modulated by attention. Leveraging this finding, we introduce a probabilistic approach based on Hidden Markov Model Regression (HMMR) to decode the attentional state of the listener with respect to a given speech stimulus. Our method is novel in that it uses only the target stream to detect the attended segments, while existing methods require knowledge about the target and distractor. This is a particular advantage in real-world applications where the number of sources are often time-variant and unknown to the decoder. We use synthetic data to evaluate robustness and tracking capability, and real electrophysiological data to demonstrate how the proposed method achieves accuracies commensurate to BCI (Brain Computer Interface) systems deployed in the field.

[1]  John J. Foxe,et al.  Attentional Selection in a Cocktail Party Environment Can Be Decoded from Single-Trial EEG. , 2015, Cerebral cortex.

[2]  J. Simon,et al.  Emergence of neural encoding of auditory objects while listening to competing speakers , 2012, Proceedings of the National Academy of Sciences.

[3]  Jonathan Z. Simon,et al.  A State-Space Model for Decoding Auditory Attentional Modulation from MEG in a Competing-Speaker Environment , 2014, NIPS.

[4]  Eric Larson,et al.  Mapping cortical dynamics using simultaneous MEG/EEG and anatomically-constrained minimum-norm estimates: an auditory attention example. , 2012, Journal of visualized experiments : JoVE.

[5]  N. Mesgarani,et al.  Selective cortical representation of attended speaker in multi-talker speech perception , 2012, Nature.

[6]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[7]  T. Picton,et al.  Human Cortical Responses to the Speech Envelope , 2008, Ear and hearing.

[8]  Yulia Kempner,et al.  Application of piece-wise regression to detecting internal structure of signal , 1992, Pattern Recognit..

[9]  Van Nostrand,et al.  Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm , 1967 .

[10]  M. Fridman Hidden Markov model regression , 1993 .

[11]  D. Poeppel,et al.  Phase Patterns of Neuronal Responses Reliably Discriminate Speech in Human Auditory Cortex , 2007, Neuron.

[12]  Guy J. Brown,et al.  A multi-pitch tracking algorithm for noisy speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Eric Larson,et al.  Switching auditory attention using spatial and non-spatial features recruits different cortical networks , 2014, NeuroImage.

[14]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[15]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale $\ell_1$-Regularized Least Squares , 2007, IEEE Journal of Selected Topics in Signal Processing.

[16]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[17]  J. Simon,et al.  Neural coding of continuous speech in auditory cortex during monaural and dichotic listening. , 2012, Journal of neurophysiology.

[18]  Adrian K. C. Lee,et al.  Influence of preparation time and pitch separation in switching of auditory attention between streams. , 2013, The Journal of the Acoustical Society of America.