Improving syllable identification by a preprocessing method reducing overlap-masking in reverberant environments.

Overlap-masking degrades speech intelligibility in reverberation [R. H. Bolt and A. D. MacDonald, J. Acoust. Soc. Am. 21(6), 577-580 (1949)]. To reduce the effect of this degradation, steady-state suppression has been proposed as a preprocessing technique [Arai et al., Proc. Autumn Meet. Acoust. Soc. Jpn., 2001; Acoust. Sci. Tech. 23(8), 229-232 (2002)]. This technique automatically suppresses steady-state portions of speech that have more energy but are less crucial for speech perception. The present paper explores the effect of steady-state suppression on syllable identification preceded by /a/ under various reverberant conditions. In each of two perception experiments, stimuli were presented to 22 subjects with normal hearing. The stimuli consisted of mono-syllables in a carrier phrase with and without steady-state suppression and were presented under different reverberant conditions using artificial impulse responses. The results indicate that steady-state suppression statistically improves consonant identification for reverberation times of 0.7 to 1.2 s. Analysis of confusion matrices shows that identification of voiced consonants, stop and nasal consonants, and bilabial, alveolar, and velar consonants were especially improved by steady-state suppression. The steady-state suppression is demonstrated to be an effective preprocessing method for improving syllable identification by reducing the effect of overlap-masking under specific reverberant conditions.

[1]  James J. Jenkins,et al.  Dynamic specification of coarticulated vowels , 1983 .

[2]  Masato Miyoshi,et al.  Inverse filtering of room acoustics , 1988, IEEE Trans. Acoust. Speech Signal Process..

[3]  A. Nabelek,et al.  Monaural and binaural speech perception in reverberation for listeners of various ages. , 1982, The Journal of the Acoustical Society of America.

[4]  T. Langhans,et al.  Speech enhancement by nonlinear multiband envelope filtering , 1982, ICASSP.

[5]  A. Nabelek,et al.  Reverberant overlap- and self-masking in consonant identification. , 1989, The Journal of the Acoustical Society of America.

[6]  S. Furui On the role of spectral transition for speech perception. , 1986, The Journal of the Acoustical Society of America.

[7]  S. Gordon-Salant Recognition of natural and time/intensity altered CVs by young and elderly subjects with normal hearing. , 1986, The Journal of the Acoustical Society of America.

[8]  R. Plomp,et al.  Effect of temporal envelope smearing on speech reception. , 1994, The Journal of the Acoustical Society of America.

[9]  Tomohiro Nakatani,et al.  Efficient blind dereverberation framework for automatic speech recognition , 2005, INTERSPEECH.

[10]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[11]  C. Crandell,et al.  Classroom Acoustics for Children With Normal Hearing and With Hearing Impairment. , 2000, Language, speech, and hearing services in schools.

[12]  Keisuke Kinoshita,et al.  Effects of suppressing steady-state portions of speech on intelligibility in reverberant environments , 2002 .

[13]  L D Braida,et al.  Intelligibility of conversational and clear speech in noise and reverberation for listeners with normal and impaired hearing. , 1994, The Journal of the Acoustical Society of America.

[14]  A K Nábĕlek,et al.  Influence of the precedence effect on word identification by normally hearing and hearing-impaired subjects. , 1978, The Journal of the Acoustical Society of America.

[15]  Athanassios Protopapas,et al.  Intelligibility of modified speech for young listeners with normal and impaired hearing. , 2002, Journal of speech, language, and hearing research : JSLHR.

[16]  Yuji Murahara,et al.  Modulation enhancement of speech as a preprocessing for reverberant chambers with the hearing-impaired , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[17]  Hynek Hermansky,et al.  Study on the dereverberation of speech based on temporal envelope filtering , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[18]  Vern O. Knudsen THE HEARING OF SPEECH IN AUDITORIUMS , 1929 .

[19]  R. Plomp,et al.  Effect of reverberation and noise on the intelligibility of sentences in cases of presbyacusis. , 1980, The Journal of the Acoustical Society of America.

[20]  Steven Greenberg,et al.  Speech intelligibility in the presence of cross-channel spectral asynchrony , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[21]  T W Tillman,et al.  Room acoustics effects on monosyllabic word discrimination ability for normal and hearing-impaired children. , 1978, Journal of speech and hearing research.

[22]  J. Pickett,et al.  Monaural and binaural speech perception through hearing aids under noise and reverberation with normal and hearing-impaired listeners. , 1974, Journal of speech and hearing research.

[23]  Takayuki Arai,et al.  Modulation enhancement of speech by a pre-processing algorithm for improving intelligibility in reverberant environments , 2005, Speech Commun..

[24]  H. Hermansky,et al.  Syllable intelligibility for temporally filtered LPC cepstral trajectories. , 1999, The Journal of the Acoustical Society of America.

[25]  R. H. Bolt,et al.  Theory of Speech masking by reverberation , 1949 .

[26]  Keisuke Kinoshita,et al.  Designing modulation filters for improving speech intelligibility in reverberant environments , 2000, INTERSPEECH.

[27]  Steven Greenberg,et al.  What are the Essential Cues for Understanding Spoken Language? , 2001, IEICE Trans. Inf. Syst..

[28]  A K Nábĕlek,et al.  Perception of consonants in reverberation by native and non-native listeners. , 1984, The Journal of the Acoustical Society of America.

[29]  T. Houtgast,et al.  A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria , 1985 .

[30]  J. Flanagan,et al.  Computer‐steered microphone arrays for sound transduction in large rooms , 1985 .

[31]  A. Nabelek,et al.  English consonant recognition in noise and in reverberation by Japanese and American listeners. , 1990, The Journal of the Acoustical Society of America.