Speech intelligibility in the presence of cross-channel spectral asynchrony

The spectrum of spoken sentences was partitioned into quarter-octave channels and the onset of each channel shifted in time relative to the others so as to desynchronize spectral information across the frequency axis. Human listeners are remarkably tolerant of cross-channel spectral asynchrony induced in this fashion. Speech intelligibility remains relatively unimpaired until the average asynchrony spans three or more phonetic segments. Such perceptual robustness is correlated with the magnitude of the low-frequency (3-6 Hz) modulation spectrum and thus highlights the importance of syllabic segmentation and analysis for robust processing of spoken language. High-frequency channels (>1.5 kHz) play a particularly important role when the spectral asynchrony is sufficiently large as to significantly reduce the power in the low-frequency modulation spectrum (analogous to acoustic reverberation) and may thereby account for the deterioration of speech intelligibility among the hearing impaired under conditions of acoustic interference (such as background noise and reverberation) characteristic of the real world.

[1]  Dennis H. Klatt,et al.  Speech perception: a model of acoustic–phonetic analysis and lexical access , 1979 .

[2]  B E Walden,et al.  Spectral distribution of prosodic information. , 1996, Journal of speech and hearing research.

[3]  Misha Pavel,et al.  Intelligibility of speech with filtered time trajectories of spectral envelopes , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[4]  D. Pisoni,et al.  Acoustic-phonetic representations in word recognition , 1987, Cognition.

[5]  Kenneth N. Stevens,et al.  On the quantal nature of speech , 1972 .

[6]  Steven Greenberg,et al.  ON THE ORIGINS OF SPEECH INTELLIGIBILITY IN THE REAL WORLD , 1997 .

[7]  K. Stevens,et al.  Effects of a vocal-tract constriction on the glottal source: experimental and modelling studies , 1986 .

[8]  T. Houtgast,et al.  A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria , 1985 .

[9]  C. B. Pedersen,et al.  Temporal Factors in Speech Perception , 1982 .

[10]  G F Smoorenburg,et al.  Speech reception in quiet and in noisy conditions by individuals with noise-induced hearing loss in relation to their tone audiogram. , 1989, The Journal of the Acoustical Society of America.

[11]  A M Liberman,et al.  A specialization for speech perception. , 1989, Science.

[12]  D D Dirks,et al.  Application of the Articulation Index and the Speech Transmission Index to the recognition of speech by normal-hearing and hearing-impaired listeners. , 1986, Journal of speech and hearing research.

[13]  Hynek Hermansky,et al.  Study on the dereverberation of speech based on temporal envelope filtering , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[14]  Steven Greenberg,et al.  INSIGHTS INTO SPOKEN LANGUAGE GLEANED FROM PHONETIC TRANSCRIPTION OF THE SWITCHBOARD CORPUS , 1996 .

[15]  Kenneth N. Stevens,et al.  Applying phonetic knowledge to lexical access , 1995, EUROSPEECH.

[16]  D D Dirks,et al.  Stop-consonant recognition for normal-hearing listeners and listeners with high-frequency hearing loss. II: Articulation index predictions. , 1989, The Journal of the Acoustical Society of America.

[17]  R. Plomp,et al.  Effect of temporal envelope smearing on speech reception. , 1994, The Journal of the Acoustical Society of America.