Leveraging automatic speech recognition in cochlear implants for improved speech intelligibility under reverberation

Despite recent advancements in digital signal processing technology for cochlear implant (CI) devices, there still remains a significant gap between speech identification performance of CI users in reverberation compared to that in anechoic quiet conditions. Alternatively, automatic speech recognition (ASR) systems have seen significant improvements in recent years resulting in robust speech recognition in a variety of adverse environments, including reverberation. In this study, we exploit advancements seen in ASR technology for alternative formulated solutions to benefit CI users. Specifically, an ASR system is developed using multicondition training on speech data with different reverberation characteristics (e.g., T60 values), resulting in low word error rates (WER) in reverberant conditions. A speech synthesizer is then utilized to generate speech waveforms from the output of the ASR system, from which the synthesized speech is presented to CI listeners. The effectiveness of this hybrid recognition-synthesis CI strategy is evaluated under moderate to highly reverberant conditions (i.e., T60 = 0.3, 0.6, 0.8, and 1.0s) using speech material extracted from the TIMIT corpus. Experimental results confirm the effectiveness of multi-condition training on performance of the ASR system in reverberation, which consequently results in substantial speech intelligibility gains for CI users in reverberant environments.

[1]  Kostas Kokkinakis,et al.  A channel-selection criterion for suppressing reverberation in cochlear implants. , 2011, The Journal of the Acoustical Society of America.

[2]  Xiaosong Wang,et al.  Phase-sensitive speech enhancement for cochlear implant processing , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Philipos C. Loizou,et al.  Design and Evaluation of a Personal Digital Assistant-based Research Platform for Cochlear Implants , 2013, IEEE Transactions on Biomedical Engineering.

[4]  Oldooz Hazrati,et al.  The combined effects of reverberation and noise on speech intelligibility by cochlear implant listeners , 2012, International journal of audiology.

[5]  H. Ney,et al.  Linear discriminant analysis for improved large vocabulary continuous speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Fan-Gang Zeng,et al.  Combined spectral and temporal enhancement to improve cochlear-implant speech perception. , 2011, The Journal of the Acoustical Society of America.

[7]  John H. L. Hansen,et al.  UT-Vocal Effort II: Analysis and constrained-lexicon recognition of whispered speech , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Victor Zue,et al.  Speech database development at MIT: Timit and beyond , 1990, Speech Commun..

[9]  Seyed Omid Sadjadi,et al.  Simultaneous suppression of noise and reverberation in cochlear implants using a ratio masking strategy. , 2013, The Journal of the Acoustical Society of America.

[10]  Hoi Lee,et al.  A PDA-based Research Platform for Cochlear Implants , 2007, 2007 3rd International IEEE/EMBS Conference on Neural Engineering.

[11]  Yi Hu,et al.  Use of a sigmoidal-shaped function for noise attenuation in cochlear implants. , 2007, The Journal of the Acoustical Society of America.

[12]  Stefan J. Mauger,et al.  Clinical Evaluation of Signal-to-Noise Ratio–Based Noise Reduction in Nucleus® Cochlear Implant Recipients , 2011, Ear and hearing.

[13]  IEEE Recommended Practice for Speech Quality Measurements , 1969, IEEE Transactions on Audio and Electroacoustics.

[14]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[15]  Oldooz Hazrati,et al.  Reverberation suppression in cochlear implants using a blind channel-selection strategy. , 2013, The Journal of the Acoustical Society of America.

[16]  Saeed Vaseghi,et al.  Speech recognition in noisy environments , 1992, ICSLP.

[17]  Fan-Gang Zeng,et al.  Encoding frequency Modulation to improve cochlear implant performance in noise , 2005, IEEE Transactions on Biomedical Engineering.

[18]  Oldooz Hazrati,et al.  Blind binary masking for reverberation suppression in cochlear implants. , 2013, The Journal of the Acoustical Society of America.

[19]  Yi Hu,et al.  A new sound coding strategy for suppressing noise in cochlear implants. , 2008, The Journal of the Acoustical Society of America.

[20]  John H. L. Hansen,et al.  A comparison of front-end compensation strategies for robust LVCSR under room reverberation and increased vocal effort , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Joshua J. Hajicek,et al.  Combined Effects of Noise and Reverberation on Speech Recognition Performance of Normal-Hearing Children and Adults , 2010, Ear and hearing.

[22]  David B. Grayden,et al.  Algorithms to improve listening in noise for cochlear implant users , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[23]  Ramesh A. Gopinath,et al.  Maximum likelihood modeling with Gaussian distributions for classification , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[24]  M. Picheny,et al.  Comparison of Parametric Representation for Monosyllabic Word Recognition in Continuously Spoken Sentences , 2017 .

[25]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..