Speech Intelligibility is Highly Tolerant of Cross-Channel Spectral Asynchrony
暂无分享,去创建一个
A detailed auditory analysis of the short‐term acoustic spectrum is generally considered essential for understanding spoken language. This assumption is called into question by the results of an experiment in which the spectrum of spoken sentences (from the TIMIT corpus) was partitioned into quarter‐octave channels and the onset of each channel shifted in time relative to the others so as to desynchronize spectral information across the frequency plane. Intelligibility of sentential material (as measured in terms of word accuracy) is unaffected by a (maximum) onset jitter of 80 ms or less and remains high (>75%) even for jitter intervals of 140 ms. Only when the jitter imposed across channels exceeds 220 ms does intelligibility fall below 50%. These results imply that the cues required to understand spoken language are not optimally specified in the short‐term spectral domain, but may rather be based on some other set of representational cues such as the modulation spectrogram [S. Greenberg and B. Kingsbury, Proc. IEEE ICASSP (1997), pp. 1647–1650]. Consistent with this hypothesis is the fact that intelligibility (as a function of onset‐jitter interval) is highly correlated with the magnitude of the modulation spectrum between 3 and 8 Hz.
[1] C. B. Pedersen,et al. Temporal Factors in Speech Perception , 1982 .
[2] Steven Greenberg,et al. Speech intelligibility in the presence of cross-channel spectral asynchrony , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[3] Steven Greenberg,et al. INSIGHTS INTO SPOKEN LANGUAGE GLEANED FROM PHONETIC TRANSCRIPTION OF THE SWITCHBOARD CORPUS , 1996 .