Misperceptions Arising from Speech-in-Babble Interactions

The deterioration of speech intelligibility in the presence of other sound sources has been explained in terms of both energetic masking, which renders parts of the speech signal inaudible, and informational masking, in which audible components of the masker interfere with speech identification. The current study focuses on the role of a specific form of informational masking in which audible glimpses of both target and masker combine to produce an incorrect listener percept. We examine a corpus of word misperceptions in Spanish which occur when target words are combined with a babble masker. Glimpses originating in both the target and the masker are force-aligned to the reported misperceived word in order to identify the most likely acoustic evidential basis for the confusion. In this way, the degree of involvement of both target and masker can be quantified. In nearly all cases, the best explanation for the misperception involves recruiting evidence from the babble masker (type I error), and in more than 80% of the tokens some of the audible target evidence is ignored (type II error). These findings suggest misallocation of acoustic-phonetic material plays a significant role in the generation of speech-in-babble confusions.

[1]  Martin Cooke Discovering consistent word confusions in noise , 2009, INTERSPEECH.

[2]  M. Ericson,et al.  Informational and energetic masking effects in the perception of multiple simultaneous talkers. , 2001, The Journal of the Acoustical Society of America.

[3]  María Luisa García Lecumberri,et al.  A corpus of noise-induced word misperceptions for Spanish. , 2015, The Journal of the Acoustical Society of America.

[4]  Jont B. Allen,et al.  A psychoacoustic method to find the perceptual cues of stop consonants in natural speech. , 2010, The Journal of the Acoustical Society of America.

[5]  T W Tillman,et al.  Perceptual masking in multiple sound backgrounds. , 1969, The Journal of the Acoustical Society of America.

[6]  Marion S Régnier,et al.  A method to identify noise-robust perceptual features: application for consonant /t/. , 2008, The Journal of the Acoustical Society of America.

[7]  M. Hoen,et al.  Using auditory classification images for the identification of fine acoustic cues used in speech perception , 2013, Front. Hum. Neurosci..

[8]  DeLiang Wang,et al.  Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation. , 2006, The Journal of the Acoustical Society of America.