An effectively causal deep learning algorithm to increase intelligibility in untrained noises for hearing-impaired listeners.

Real-time operation is critical for noise reduction in hearing technology. The essential requirement of real-time operation is causality-that an algorithm does not use future time-frame information and, instead, completes its operation by the end of the current time frame. This requirement is extended currently through the concept of "effectively causal," in which future time-frame information within the brief delay tolerance of the human speech-perception mechanism is used. Effectively causal deep learning was used to separate speech from background noise and improve intelligibility for hearing-impaired listeners. A single-microphone, gated convolutional recurrent network was used to perform complex spectral mapping. By estimating both the real and imaginary parts of the noise-free speech, both the magnitude and phase of the estimated noise-free speech were obtained. The deep neural network was trained using a large set of noises and tested using complex noises not employed during training. Significant algorithm benefit was observed in every condition, which was largest for those with the greatest hearing loss. Allowable delays across different communication settings are reviewed and assessed. The current work demonstrates that effectively causal deep learning can significantly improve intelligibility for one of the largest populations of need in challenging conditions involving untrained background noises.

[1]  DeLiang Wang,et al.  A talker-independent deep learning algorithm to increase intelligibility for hearing-impaired listeners in reverberant competing talker conditions. , 2020, The Journal of the Acoustical Society of America.

[2]  DeLiang Wang,et al.  Learning Complex Spectral Mapping With Gated Convolutional Recurrent Networks for Monaural Speech Enhancement , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[3]  Brian C J Moore,et al.  Using recurrent neural networks to improve the perception of speech in non-stationary noise by people with cochlear implants. , 2019, The Journal of the Acoustical Society of America.

[4]  Richard E. Turner,et al.  Comparison of effects on subjective intelligibility and quality of speech in babble for two algorithms: A deep recurrent neural network and spectral subtraction. , 2019, The Journal of the Acoustical Society of America.

[5]  Li Zhao,et al.  Efficient Sequence Learning with Group Recurrent Networks , 2018, NAACL.

[6]  DeLiang Wang,et al.  A Convolutional Recurrent Neural Network for Real-Time Speech Enhancement , 2018, INTERSPEECH.

[7]  DeLiang Wang,et al.  A deep learning based segregation algorithm to increase speech intelligibility for hearing-impaired listeners in reverberant-noisy conditions. , 2018, The Journal of the Acoustical Society of America.

[8]  Tom Barker,et al.  Improving competing voices segregation for hearing impaired listeners using a low-latency deep neural network algorithm. , 2018, The Journal of the Acoustical Society of America.

[9]  Torsten Dau,et al.  The benefit of combining a deep neural network architecture with ideal ratio mask estimation in computational speech segregation to improve speech intelligibility , 2018, PloS one.

[10]  Sashank J. Reddi,et al.  On the Convergence of Adam and Beyond , 2018, ICLR.

[11]  Jessica J. M. Monaghan,et al.  Tolerable delay for speech production and perception: effects of hearing ability and experience with hearing aids , 2018, International journal of audiology.

[12]  Yu Tsao,et al.  Complex spectrogram enhancement by convolutional neural network with multi-metrics learning , 2017, 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP).

[13]  Xin Yang,et al.  Auditory inspired machine learning techniques can improve speech intelligibility and quality for hearing-impaired listeners. , 2017, The Journal of the Acoustical Society of America.

[14]  Jessica J. M. Monaghan,et al.  Speech enhancement based on neural networks improves speech intelligibility in noise for cochlear implant users , 2017, Hearing Research.

[15]  Yann Dauphin,et al.  Language Modeling with Gated Convolutional Networks , 2016, ICML.

[16]  Jesper Jensen,et al.  An Algorithm for Predicting the Intelligibility of Speech Masked by Modulated Noise Maskers , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[17]  DeLiang Wang,et al.  Large-scale training to increase speech intelligibility for hearing-impaired listeners in novel noises. , 2016, The Journal of the Acoustical Society of America.

[18]  DeLiang Wang,et al.  Complex Ratio Masking for Monaural Speech Separation , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[19]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[20]  DeLiang Wang,et al.  An algorithm to increase speech intelligibility for hearing-impaired listeners in novel segments of the same noise type. , 2015, The Journal of the Acoustical Society of America.

[21]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[22]  DeLiang Wang,et al.  Speech-cue transmission by an algorithm to increase consonant recognition in noise for hearing-impaired listeners. , 2014, The Journal of the Acoustical Society of America.

[23]  D. Bates,et al.  Fitting Linear Mixed-Effects Models Using lme4 , 2014, 1406.5823.

[24]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[25]  DeLiang Wang,et al.  An algorithm to improve speech recognition in noise for hearing-impaired listeners. , 2013, The Journal of the Acoustical Society of America.

[26]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[27]  D. Pisoni,et al.  Audiovisual asynchrony detection and speech perception in hearing-impaired listeners with cochlear implants: A preliminary analysis , 2009, International journal of audiology.

[28]  Brian C. J. Moore,et al.  Tolerable Hearing Aid Delays. V. Estimation of Limits for Open Canal Fittings , 2008, Ear and hearing.

[29]  D. Pisoni,et al.  Auditory-visual speech perception and synchrony detection for speech and nonspeech signals. , 2006, The Journal of the Acoustical Society of America.

[30]  B. Moore,et al.  Tolerable Hearing-Aid Delays: IV. Effects on Subjective Disturbance During Speech Production by Hearing-Impaired Subjects , 2005, Ear and hearing.

[31]  Andries P. Hekstra,et al.  Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[32]  M A Stone,et al.  Tolerable hearing aid delays. I. Estimation of limits imposed by the auditory path alone using simulated hearing losses. , 1999, Ear and hearing.

[33]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[34]  P. Newall,et al.  Hearing aid gain and frequency response requirements for the severely/profoundly hearing impaired. , 1990, Ear and hearing.

[35]  G. Studebaker A "rationalized" arcsine transform. , 1985, Journal of speech and hearing research.

[36]  IEEE Recommended Practice for Speech Quality Measurements , 1969, IEEE Transactions on Audio and Electroacoustics.

[37]  E. Harford Bilateral cros. Two sided listening with one hearing aid. , 1966, Archives of otolaryngology.