Predicting the intelligibility of vocoded and wideband Mandarin Chinese.

Due to the limited number of cochlear implantees speaking Mandarin Chinese, it is extremely difficult to evaluate new speech coding algorithms designed for tonal languages. Access to an intelligibility index that could reliably predict the intelligibility of vocoded (and non-vocoded) Mandarin Chinese is a viable solution to address this challenge. The speech-transmission index (STI) and coherence-based intelligibility measures, among others, have been examined extensively for predicting the intelligibility of English speech but have not been evaluated for vocoded or wideband (non-vocoded) Mandarin speech despite the perceptual differences between the two languages. The results indicated that the coherence-based measures seem to be influenced by the characteristics of the spoken language. The highest correlation (r = 0.91-0.97) was obtained in Mandarin Chinese with a weighted coherence measure that included primarily information from high-intensity voiced segments (e.g., vowels) containing F0 information, known to be important for lexical tone recognition. In contrast, in English, highest correlation was obtained with a coherence measure that included information from weak consonants and vowel/consonant transitions. A band-importance function was proposed that captured information about the amplitude envelope contour. A higher modulation rate (100 Hz) was found necessary for the STI-based measures for maximum correlation (r = 0.94-0.96) with vocoded Mandarin and English recognition.

[1]  Tammo Houtgast,et al.  Effect of talker and speaking style on the speech transmission index. , 2004, The Journal of the Acoustical Society of America.

[2]  Shangkai Gao,et al.  A novel speech-processing strategy incorporating tonal information for cochlear implants , 2004, IEEE Transactions on Biomedical Engineering.

[3]  T Houtgast,et al.  A physical method for measuring speech-transmission quality. , 1980, The Journal of the Acoustical Society of America.

[4]  Ning Zhou,et al.  Development and evaluation of methods for assessing tone production skills in Mandarin-speaking children with cochlear implants. , 2008, The Journal of the Acoustical Society of America.

[5]  J. Howie Acoustical Studies of Mandarin Vowels and Tones , 1976 .

[6]  F. Zeng,et al.  Identification of temporal envelope cues in Chinese tone recognition , 2000 .

[7]  R. Shannon,et al.  Speech recognition in noise as a function of the number of spectral channels: comparison of acoustic hearing and cochlear implants. , 2001, The Journal of the Acoustical Society of America.

[8]  R. Plomp,et al.  Effect of reducing slow temporal modulations on speech reception. , 1994, The Journal of the Acoustical Society of America.

[9]  K. D. Kryter Validation of the Articulation Index , 1962 .

[10]  Q J Fu,et al.  Effects of noise and spectral resolution on vowel and consonant recognition: acoustic and electric hearing. , 1998, The Journal of the Acoustical Society of America.

[11]  M. Dorman,et al.  Simulating the effect of cochlear-implant electrode insertion depth on speech understanding. , 1997, The Journal of the Acoustical Society of America.

[12]  A. Palva,et al.  Filtered speech audiometry. I. Basic studies with Finnish speech towards the creation of a method for the diagnosis of central hearing disorders. , 1965, Acta oto-laryngologica. Supplementum.

[13]  James M Kates,et al.  Coherence and the speech intelligibility index. , 2004, The Journal of the Acoustical Society of America.

[14]  D H Whalen,et al.  Information for Mandarin Tones in the Amplitude Contour and in Brief Segments , 1990, Phonetica.

[15]  James M Kates,et al.  Effects of noise and distortion on speech quality judgments in normal-hearing and hearing-impaired listeners. , 2007, The Journal of the Acoustical Society of America.

[16]  Yi Hu,et al.  Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions. , 2009, The Journal of the Acoustical Society of America.

[17]  Fei Chen,et al.  Contribution of Consonant Landmarks to Speech Recognition in Simulated Acoustic-Electric Hearing , 2010, Ear and hearing.

[18]  T. Houtgast,et al.  A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria , 1985 .

[19]  Herman J. M. Steeneken,et al.  A multi-language evaluation of the RASTI method for estimating speech intelligibility in auditoria , 1982 .

[20]  Jian Kang Comparison of speech intelligibility between English and Chinese , 1998 .

[21]  Yi Xu,et al.  Information for Mandarin tones in the amplitude contour and in brief segments , 1990 .

[22]  F. Zeng,et al.  Importance of tonal envelope cues in Chinese speech recognition. , 1998, The Journal of the Acoustical Society of America.

[23]  J M Kates,et al.  On using coherence to measure distortion in hearing aids. , 1992, The Journal of the Acoustical Society of America.

[24]  R V Shannon,et al.  Speech Recognition with Primarily Temporal Cues , 1995, Science.

[25]  Raymond L. Goldsworthy,et al.  Analysis of speech-based Speech Transmission Index methods with implications for nonlinear operations. , 2004, The Journal of the Acoustical Society of America.

[26]  Robert V. Shannon,et al.  Holes in Hearing , 2002, Journal of the Association for Research in Otolaryngology.

[27]  Sharon A McKarns,et al.  The Benefits of Combining Acoustic and Electric Stimulation for the Recognition of Speech, Voice and Melodies , 2007, Audiology and Neurotology.

[28]  Xin Luo,et al.  Enhancing Chinese tone recognition by manipulating amplitude envelope: implications for cochlear implants. , 2004, The Journal of the Acoustical Society of America.

[29]  Fan-Gang Zeng,et al.  Cochlear implant speech recognition with speech maskers. , 2004, The Journal of the Acoustical Society of America.

[30]  M. Dorman,et al.  Speech intelligibility as a function of the number of channels of stimulation for signal processors using sine-wave and noise-band outputs. , 1997, The Journal of the Acoustical Society of America.

[31]  Peng Jianxin,et al.  Relationship between Chinese speech intelligibility and speech transmission index using diotic listening , 2007, Speech Commun..

[32]  Julio González,et al.  Gender and speaker identification as a function of the number of channels in spectrally reduced speech. , 2005, The Journal of the Acoustical Society of America.

[33]  Fei Chen,et al.  Predicting the Intelligibility of Vocoded Speech , 2011, Ear and hearing.

[34]  Bruce J. Gantz,et al.  Acoustic plus Electric Speech Processing: Preliminary Results of a Multicenter Clinical Trial of the Iowa/Nucleus Hybrid Implant , 2006, Audiology and Neurotology.

[35]  Michael K. Qin,et al.  Effects of Envelope-Vocoder Processing on F0 Discrimination and Concurrent-Vowel Identification , 2005, Ear and hearing.