Effects of noise suppression and envelope dynamic range compression on the intelligibility of vocoded sentences for a tonal language.

Vocoder simulation studies have suggested that the carrier signal type employed affects the intelligibility of vocoded speech. The present work further assessed how carrier signal type interacts with additional signal processing, namely, single-channel noise suppression and envelope dynamic range compression, in determining the intelligibility of vocoder simulations. In Experiment 1, Mandarin sentences that had been corrupted by speech spectrum-shaped noise (SSN) or two-talker babble (2TB) were processed by one of four single-channel noise-suppression algorithms before undergoing tone-vocoded (TV) or noise-vocoded (NV) processing. In Experiment 2, dynamic ranges of multiband envelope waveforms were compressed by scaling of the mean-removed envelope waveforms with a compression factor before undergoing TV or NV processing. TV Mandarin sentences yielded higher intelligibility scores with normal-hearing (NH) listeners than did noise-vocoded sentences. The intelligibility advantage of noise-suppressed vocoded speech depended on the masker type (SSN vs 2TB). NV speech was more negatively influenced by envelope dynamic range compression than was TV speech. These findings suggest that an interactional effect exists between the carrier signal type employed in the vocoding process and envelope distortion caused by signal processing.

[1]  Yi Hu,et al.  The contribution of matched envelope dynamic range to the binaural benefits in simulated bilateral electric hearing. , 2013, Journal of speech, language, and hearing research : JSLHR.

[2]  Yu Tsao,et al.  Effects of Adaptation Rate and Noise Suppression on the Intelligibility of Compressed-Envelope Based Speech , 2015, PloS one.

[3]  Fan-Gang Zeng,et al.  Speech dynamic range and its effect on cochlear implant performance. , 2002, The Journal of the Acoustical Society of America.

[4]  Yi Hu,et al.  A generalized subspace approach for enhancing speech corrupted by colored noise , 2003, IEEE Trans. Speech Audio Process..

[5]  Xin Luo,et al.  Enhancing Chinese tone recognition by manipulating amplitude envelope: implications for cochlear implants. , 2004, The Journal of the Acoustical Society of America.

[6]  Volker Hohmann,et al.  Comparing Binaural Pre-processing Strategies I , 2015, Trends in hearing.

[7]  Yonghong Yan,et al.  Comparative intelligibility investigation of single-channel noise-reduction algorithms for Chinese, Japanese, and English. , 2011, The Journal of the Acoustical Society of America.

[8]  J. Galvin,et al.  The Role of Spectral and Temporal Cues in Voice Gender Discrimination by Normal-Hearing Listeners and Cochlear Implant Users , 2004, Journal of the Association for Research in Otolaryngology.

[9]  Tim Jürgens,et al.  Spatial Release From Masking in Simulated Cochlear Implant Users With and Without Access to Low-Frequency Acoustic Hearing , 2015, Trends in hearing.

[10]  G. Studebaker A "rationalized" arcsine transform. , 1985, Journal of speech and hearing research.

[11]  Fei Chen,et al.  Predicting the Intelligibility of Vocoded Speech , 2011, Ear and hearing.

[12]  Brian C J Moore,et al.  The importance for speech intelligibility of random fluctuations in "steady" background noise. , 2011, The Journal of the Acoustical Society of America.

[13]  J. Howie Acoustical Studies of Mandarin Vowels and Tones , 1976 .

[14]  Philipos C Loizou,et al.  Effect of filter spacing on melody recognition: acoustic and electric hearing. , 2007, The Journal of the Acoustical Society of America.

[15]  Julio González,et al.  Gender and speaker identification as a function of the number of channels in spectrally reduced speech. , 2005, The Journal of the Acoustical Society of America.

[16]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[17]  Fei Chen,et al.  Effect of vocoder type to Mandarin speech recognition in cochlear implant simulation , 2014, The 9th International Symposium on Chinese Spoken Language Processing.

[18]  M. Dorman,et al.  Speech intelligibility as a function of the number of channels of stimulation for signal processors using sine-wave and noise-band outputs. , 1997, The Journal of the Acoustical Society of America.

[19]  Volker Hohmann,et al.  Comparing Binaural Pre-processing Strategies I : Instrumental Evaluation , 2015 .

[20]  Shangkai Gao,et al.  A novel speech-processing strategy incorporating tonal information for cochlear implants , 2004, IEEE Transactions on Biomedical Engineering.

[21]  Yi Hu,et al.  Evaluation of Noise Reduction Methods for Sentence Recognition by Mandarin-Speaking Cochlear Implant Listeners , 2015, Ear and hearing.

[22]  Philipos C. Loizou,et al.  Reasons why Current Speech-Enhancement Algorithms do not Improve Speech Intelligibility and Suggested Solutions , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  R V Shannon,et al.  Speech Recognition with Primarily Temporal Cues , 1995, Science.

[24]  Charles S. Watson,et al.  Some comments on informational masking , 2005 .

[25]  Stuart Rosen,et al.  Spectral density affects the intelligibility of tone-vocoded speech: Implications for cochlear implant simulations. , 2015, The Journal of the Acoustical Society of America.

[26]  Bryan E Pfingst,et al.  Relative contributions of spectral and temporal cues for phoneme recognition. , 2005, The Journal of the Acoustical Society of America.

[27]  Q J Fu,et al.  Effect of acoustic dynamic range on phoneme recognition in quiet and noise by cochlear implant users. , 1999, The Journal of the Acoustical Society of America.

[28]  Sha Liu,et al.  Development of the Mandarin Hearing in Noise Test (MHINT) , 2007, Ear and hearing.

[29]  Fei Chen,et al.  Predicting the intelligibility of vocoded and wideband Mandarin Chinese. , 2011, The Journal of the Acoustical Society of America.

[30]  T W Tillman,et al.  Release of masking for speech through interaural time delay. , 1967, The Journal of the Acoustical Society of America.

[31]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[32]  Stuart Rosen,et al.  Effects of envelope bandwidth on the intelligibility of sine- and noise-vocoded speech. , 2009, The Journal of the Acoustical Society of America.

[33]  B. Moore,et al.  Benefit of high-rate envelope cues in vocoder processing: effect of number of channels and spectral region. , 2008, The Journal of the Acoustical Society of America.

[34]  Yi Hu,et al.  A comparative intelligibility study of single-microphone noise reduction algorithms. , 2007, The Journal of the Acoustical Society of America.

[35]  Pascal Scalart,et al.  Speech enhancement based on a priori signal to noise estimation , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[36]  P C Loizou,et al.  The Effect of Reduced Dynamic Range on Speech Understanding: Implications for Patients with Cochlear Implants , 2000, Ear and hearing.

[37]  Xin Luo,et al.  Contribution of low-frequency acoustic information to Chinese speech recognition in cochlear implant simulations. , 2006, The Journal of the Acoustical Society of America.

[38]  Philipos C. Loizou,et al.  A multi-band spectral subtraction method for enhancing speech corrupted by colored noise , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[39]  Richard L Freyman,et al.  Speech intelligibility in cochlear implant simulations: Effects of carrier type, interfering noise, and subject experience. , 2007, The Journal of the Acoustical Society of America.