Deep Learning–Based Noise Reduction Approach to Improve Speech Intelligibility for Cochlear Implant Recipients

Objective: We investigate the clinical effectiveness of a novel deep learning–based noise reduction (NR) approach under noisy conditions with challenging noise types at low signal to noise ratio (SNR) levels for Mandarin-speaking cochlear implant (CI) recipients. Design: The deep learning–based NR approach used in this study consists of two modules: noise classifier (NC) and deep denoising autoencoder (DDAE), thus termed (NC + DDAE). In a series of comprehensive experiments, we conduct qualitative and quantitative analyses on the NC module and the overall NC + DDAE approach. Moreover, we evaluate the speech recognition performance of the NC + DDAE NR and classical single-microphone NR approaches for Mandarin-speaking CI recipients under different noisy conditions. The testing set contains Mandarin sentences corrupted by two types of maskers, two-talker babble noise, and a construction jackhammer noise, at 0 and 5 dB SNR levels. Two conventional NR techniques and the proposed deep learning–based approach are used to process the noisy utterances. We qualitatively compare the NR approaches by the amplitude envelope and spectrogram plots of the processed utterances. Quantitative objective measures include (1) normalized covariance measure to test the intelligibility of the utterances processed by each of the NR approaches; and (2) speech recognition tests conducted by nine Mandarin-speaking CI recipients. These nine CI recipients use their own clinical speech processors during testing. Results: The experimental results of objective evaluation and listening test indicate that under challenging listening conditions, the proposed NC + DDAE NR approach yields higher intelligibility scores than the two compared classical NR techniques, under both matched and mismatched training-testing conditions. Conclusions: When compared to the two well-known conventional NR techniques under challenging listening condition, the proposed NC + DDAE NR approach has superior noise suppression capabilities and gives less distortion for the key speech envelope information, thus, improving speech recognition more effectively for Mandarin CI recipients. The results suggest that the proposed deep learning–based NR approach can potentially be integrated into existing CI signal processors to overcome the degradation of speech perception caused by noise.

[1]  Nam C. Phamdo,et al.  Signal/noise KLT based approach for enhancing speech degraded by colored noise , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[2]  Hui Jiang,et al.  Confidence measures for speech recognition: A survey , 2005, Speech Commun..

[3]  Jessica J. M. Monaghan,et al.  Speech enhancement based on neural networks improves speech intelligibility in noise for cochlear implant users , 2017, Hearing Research.

[4]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[5]  Yariv Ephraim,et al.  A signal subspace approach for speech enhancement , 1995, IEEE Trans. Speech Audio Process..

[6]  Fan-Gang Zeng,et al.  Cochlear implant speech recognition with speech maskers. , 2004, The Journal of the Acoustical Society of America.

[7]  Israel Cohen,et al.  Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging , 2003, IEEE Trans. Speech Audio Process..

[8]  I. Cohen,et al.  Noise estimation by minima controlled recursive averaging for robust speech enhancement , 2002, IEEE Signal Processing Letters.

[9]  Yi Hu,et al.  Evaluation of Noise Reduction Methods for Sentence Recognition by Mandarin-Speaking Cochlear Implant Listeners , 2015, Ear and hearing.

[10]  R V Shannon,et al.  Speech Recognition with Primarily Temporal Cues , 1995, Science.

[11]  Yu Tsao,et al.  S 1 and S 2 Heart Sound Recognition using Deep Neural Networks , 2022 .

[12]  Margaret W Skinner,et al.  Speech Recognition in Cochlear Implant Recipients: Comparison of Standard HiRes and HiRes 120 Sound Processing , 2009, Otology & neurotology : official publication of the American Otological Society, American Neurotology Society [and] European Academy of Otology and Neurotology.

[13]  R. Shannon,et al.  Speech recognition in noise as a function of the number of spectral channels: comparison of acoustic hearing and cochlear implants. , 2001, The Journal of the Acoustical Society of America.

[14]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[15]  Joseph Dien,et al.  Issues in the application of the average reference: Review, critiques, and recommendations , 1998 .

[16]  J. Hsieh,et al.  Music Training Improves Pitch Perception in Prelingually Deafened Children With Cochlear Implants , 2010, Pediatrics.

[17]  Unto K. Laine,et al.  Comparison of classifiers in audio and acceleration based context classification in mobile phones , 2011, 2011 19th European Signal Processing Conference.

[18]  Pascal Scalart,et al.  Speech enhancement based on a priori signal to noise estimation , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[19]  Thomas Lenarz,et al.  Results of a Pilot Study With a Signal Enhancement Algorithm for HiRes 120 Cochlear Implant Users , 2010, Otology & neurotology : official publication of the American Otological Society, American Neurotology Society [and] European Academy of Otology and Neurotology.

[20]  Tara N. Sainath,et al.  FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[21]  Thomas W. Parsons,et al.  Study and Development of the INTEL Technique for Improving Speech Intelligibility , 1975 .

[22]  Yu Tsao,et al.  Generalized maximum a posteriori spectral amplitude estimation for speech enhancement , 2016, Speech Commun..

[23]  Fei Chen,et al.  Predicting the intelligibility of vocoded and wideband Mandarin Chinese. , 2011, The Journal of the Acoustical Society of America.

[24]  Yi Hu,et al.  Use of a sigmoidal-shaped function for noise attenuation in cochlear implants. , 2007, The Journal of the Acoustical Society of America.

[25]  Ying-Hui Lai,et al.  Development and Preliminary Verification of a Mandarin-Based Hearing-Aid Fitting Strategy , 2013, PloS one.

[26]  J Vanden Berghe,et al.  Speech Recognition in Noise for Cochlear Implantees with a Two-Microphone Monaural Adaptive Noise Reduction System , 2001, Ear and hearing.

[27]  Fan-Gang Zeng,et al.  Encoding frequency Modulation to improve cochlear implant performance in noise , 2005, IEEE Transactions on Biomedical Engineering.

[28]  R K Surr,et al.  Comparison of benefits provided by different hearing aid technologies. , 2000, Journal of the American Academy of Audiology.

[29]  Yi Hu,et al.  The contribution of matched envelope dynamic range to the binaural benefits in simulated bilateral electric hearing. , 2013, Journal of speech, language, and hearing research : JSLHR.

[30]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[31]  Yu Tsao,et al.  Speech enhancement based on deep denoising autoencoder , 2013, INTERSPEECH.

[32]  Yu Tsao,et al.  Ensemble modeling of denoising autoencoder for speech spectrum restoration , 2014, INTERSPEECH.

[33]  Y. Hu,et al.  Effects of lexical tone contour on Mandarin sentence intelligibility. , 2014, Journal of speech, language, and hearing research : JSLHR.

[34]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[35]  DeLiang Wang,et al.  Ideal ratio mask estimation using deep neural networks for robust speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[36]  R V Shannon,et al.  Psychophysical laws revealed by electric hearing. , 1999, Neuroreport.

[37]  Stefan J. Mauger,et al.  Clinical Evaluation of Signal-to-Noise Ratio–Based Noise Reduction in Nucleus® Cochlear Implant Recipients , 2011, Ear and hearing.

[38]  King Chung,et al.  Challenges and Recent Developments in Hearing Aids: Part I. Speech Understanding in Noise, Microphone Technologies and Noise Reduction Algorithms , 2004, Trends in amplification.

[39]  DeLiang Wang,et al.  Time-Frequency Masking in the Complex Domain for Speech Dereverberation and Denoising , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[40]  Changchun Bao,et al.  Wiener filtering based speech enhancement with Weighted Denoising Auto-encoder and noise classification , 2014, Speech Commun..

[41]  Jesper Jensen,et al.  Speech Intelligibility Potential of General and Specialized Deep Neural Network Based Speech Enhancement Systems , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[42]  S Hellman,et al.  Effects of noise and noise suppression on speech perception by cochlear implant users. , 1992, Ear and hearing.

[43]  Phyu P. Khing,et al.  The Effect of Automatic Gain Control Structure and Release Time on Cochlear Implant Speech Intelligibility , 2013, PloS one.

[44]  Yu Tsao,et al.  Effects of Adaptation Rate and Noise Suppression on the Intelligibility of Compressed-Envelope Based Speech , 2015, PloS one.

[45]  Thomas Lenarz,et al.  Advanced Beamformers for Cochlear Implant Users: Acute Measurement of Speech Perception in Challenging Listening Conditions , 2014, PloS one.

[46]  Geoffrey E. Hinton,et al.  Acoustic Modeling Using Deep Belief Networks , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[47]  Yi Hu,et al.  Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions. , 2009, The Journal of the Acoustical Society of America.

[48]  Björn W. Schuller,et al.  Speech Enhancement with LSTM Recurrent Neural Networks and its Application to Noise-Robust ASR , 2015, LVA/ICA.

[49]  Li-Rong Dai,et al.  A Regression Approach to Speech Enhancement Based on Deep Neural Networks , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[50]  Yu Tsao,et al.  A Deep Denoising Autoencoder Approach to Improving the Intelligibility of Vocoded Speech in Cochlear Implant Simulation , 2017, IEEE Transactions on Biomedical Engineering.

[51]  V Hamacher,et al.  Evaluation of noise reduction systems for cochlear implant users in different acoustic environment. , 1997, The American journal of otology.

[52]  R. O. Schmidt,et al.  Multiple emitter location and signal Parameter estimation , 1986 .

[53]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[54]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[55]  Saeed Gazor,et al.  An adaptive KLT approach for speech enhancement , 2001, IEEE Trans. Speech Audio Process..

[56]  Todd A Ricketts,et al.  Sound quality measures for speech in noise through a commercial hearing aid implementing digital noise reduction. , 2005, Journal of the American Academy of Audiology.

[57]  Mark Terry,et al.  Results of take‐home trial for a nonlinear beamformer used as a noise reduction strategy for cochlear implants , 1995 .

[58]  Adam A. Hersbach,et al.  Combining Directional Microphone and Single-Channel Noise Reduction Algorithms: A Clinical Evaluation in Difficult Listening Conditions With Cochlear Implant Users , 2012, Ear and hearing.

[59]  Trevor Mudge,et al.  1 A 2 . 9 TOPS / W Deep Convolutional Neural Network SoC in FD-SOI 28 nm for Intelligent Embedded Systems , 2017 .

[60]  Thomas Lenarz,et al.  Amplitude-Mapping Effects on Speech Intelligibility With Unilateral and Bilateral Cochlear Implants , 2005, Ear and hearing.

[61]  Jun Du,et al.  A speech enhancement approach using piecewise linear approximation of an explicit model of environmental distortions , 2008, INTERSPEECH.

[62]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[63]  F. Byl,et al.  The efficacy of steroids in the treatment of idiopathic sudden hearing loss. A double-blind clinical study. , 1980, Archives of otolaryngology.

[64]  Yi Hu,et al.  A generalized subspace approach for enhancing speech corrupted by colored noise , 2003, IEEE Trans. Speech Audio Process..

[65]  T Ricketts,et al.  Impact of Compression and Hearing Aid Style on Directional Hearing Aid Benefit and Performance , 2001, Ear and hearing.

[66]  DeLiang Wang,et al.  On Training Targets for Supervised Speech Separation , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[67]  Fei Chen Predicting the intelligibility of cochlear-implant vocoded speech from objective quality measure , 2012 .

[68]  R. Bentler,et al.  Digital noise reduction: Outcomes from laboratory and field studies , 2008, International journal of audiology.

[69]  E. Domico,et al.  Speech Recognition in Background Noise of Cochlear Implant Patients , 2002, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[70]  Philipos C Loizou,et al.  Speech processing in vocoder-centric cochlear implants. , 2006, Advances in oto-rhino-laryngology.

[71]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[72]  Fei Chen,et al.  Effect of vocoder type to Mandarin speech recognition in cochlear implant simulation , 2014, The 9th International Symposium on Chinese Spoken Language Processing.

[73]  Christophe Ris,et al.  Use of acoustic prior information for confidence measure in ASR applications , 2001, INTERSPEECH.

[74]  Youchang Kim,et al.  14.6 A 0.62mW ultra-low-power convolutional-neural-network face-recognition processor and a CIS integrated with always-on haar-like face detector , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[75]  Ben P. Milner,et al.  Acoustic environment classification , 2006, TSLP.

[76]  Yi Hu,et al.  Subspace algorithms for noise reduction in cochlear implants. , 2005, The Journal of the Acoustical Society of America.

[77]  David Blaauw,et al.  14.7 A 288µW programmable deep-learning processor with 270KB on-chip weight storage using non-uniform memory hierarchy for mobile intelligence , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[78]  Yu Tsao,et al.  SNR-Aware Convolutional Neural Network Modeling for Speech Enhancement , 2016, INTERSPEECH.

[79]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[80]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[81]  DeLiang Wang,et al.  Large-scale training to increase speech intelligibility for hearing-impaired listeners in novel noises. , 2016, The Journal of the Acoustical Society of America.

[82]  Philipos C. Loizou,et al.  A noise-estimation algorithm for highly non-stationary environments , 2006, Speech Commun..

[83]  Jun Du,et al.  Multi-objective learning and mask-based post-processing for deep neural network based speech enhancement , 2017, INTERSPEECH.

[84]  DeLiang Wang,et al.  Supervised Speech Separation Based on Deep Learning: An Overview , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[85]  Marian Verhelst,et al.  5 ENVISION : A 0 . 26-to-10 TOPS / W Subword-Parallel Dynamic-Voltage-Accuracy-Frequency-Scalable Convolutional Neural Network Processor in 28 nm FDSOI , 2017 .

[86]  G. E. Peterson,et al.  Revised CNC lists for auditory tests. , 1962, The Journal of speech and hearing disorders.

[87]  N Whatmough Paul,et al.  14.3 A 28nm SoC with a 1.2GHz 568nJ/prediction sparse deep-neural-network engine with >0.1 timing error rate tolerance for IoT applications , 2017 .

[88]  Hugh J. McDermott,et al.  A beamformer post-filter for cochlear implant noise reduction. , 2013, The Journal of the Acoustical Society of America.

[89]  Karen M Mispagel,et al.  Factors Affecting Open-Set Word Recognition in Adults With Cochlear Implants , 2013, Ear and hearing.

[90]  G. Studebaker A "rationalized" arcsine transform. , 1985, Journal of speech and hearing research.

[91]  P. Loizou Introduction to cochlear implants. , 1999, IEEE engineering in medicine and biology magazine : the quarterly magazine of the Engineering in Medicine & Biology Society.

[92]  M. Dorman,et al.  Speech intelligibility as a function of the number of channels of stimulation for signal processors using sine-wave and noise-band outputs. , 1997, The Journal of the Acoustical Society of America.

[93]  James R. Glass,et al.  14.4 A scalable speech recognizer with deep-neural-network acoustic models and voice-activated power gating , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[94]  Margaret W Skinner,et al.  Nucleus® 24 Advanced Encoder Conversion Study: Performance versus Preference , 2002, Ear and hearing.