An algorithm to increase speech intelligibility for hearing-impaired listeners in novel segments of the same noise type.

Machine learning algorithms to segregate speech from background noise hold considerable promise for alleviating limitations associated with hearing impairment. One of the most important considerations for implementing these algorithms into devices such as hearing aids and cochlear implants involves their ability to generalize to conditions not employed during the training stage. A major challenge involves the generalization to novel noise segments. In the current study, sentences were segregated from multi-talker babble and from cafeteria noise using an algorithm that employs deep neural networks to estimate the ideal ratio mask. Importantly, the algorithm was trained on segments of noise and tested using entirely novel segments of the same nonstationary noise type. Substantial sentence-intelligibility benefit was observed for hearing-impaired listeners in both noise types, despite the use of unseen noise segments during the test stage. Interestingly, normal-hearing listeners displayed benefit in babble but not in cafeteria noise. This result highlights the importance of evaluating these algorithms not only in human subjects, but in members of the actual target population.

[1]  S. Bacon,et al.  The effects of hearing loss and noise masking on the masking release for speech in temporally complex backgrounds. , 1998, Journal of speech, language, and hearing research : JSLHR.

[2]  D. Markle,et al.  Hearing Aids , 1936, The Journal of Laryngology & Otology.

[3]  Andrew J. Oxenham,et al.  Speech Perception in Tones and Noise via Cochlear Implants Reveals Influence of Spectral Resolution on Temporal Processing , 2014, Trends in hearing.

[4]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[6]  T CAWTHORNE,et al.  Hearing and deafness. , 1961, London Clinic medical journal.

[7]  DeLiang Wang,et al.  Binary and ratio time-frequency masks for robust speech recognition , 2006, Speech Commun..

[8]  DeLiang Wang,et al.  Towards Scaling Up Classification-Based Speech Separation , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  DeLiang Wang,et al.  On Training Targets for Supervised Speech Separation , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[10]  S. Soli,et al.  Development of the Hearing in Noise Test for the measurement of speech reception thresholds in quiet and in noise. , 1994, The Journal of the Acoustical Society of America.

[11]  R. Plomp,et al.  Effects of fluctuating noise and interfering speech on the speech-reception threshold for impaired and normal hearing. , 1990, The Journal of the Acoustical Society of America.

[12]  Torsten Dau,et al.  Requirements for the evaluation of computational speech segregation systems. , 2014, The Journal of the Acoustical Society of America.

[13]  S P Bacon,et al.  Modulation detection, modulation masking, and speech understanding in noise in the elderly. , 1992, Journal of speech and hearing research.

[14]  DeLiang Wang,et al.  Speech-cue transmission by an algorithm to increase consonant recognition in noise for hearing-impaired listeners. , 2014, The Journal of the Acoustical Society of America.

[15]  Chengzhu Yu,et al.  Evaluation of the importance of time-frequency contributions to speech intelligibility in noise. , 2014, The Journal of the Acoustical Society of America.

[16]  Joshua G. W. Bernstein,et al.  Auditory and auditory-visual intelligibility of speech in fluctuating maskers for normal-hearing and hearing-impaired listeners. , 2009, The Journal of the Acoustical Society of America.

[17]  Brian R Glasberg,et al.  Derivation of auditory filter shapes from notched-noise data , 1990, Hearing Research.

[18]  James M. Kates,et al.  The Hearing-Aid Speech Perception Index (HASPI) , 2014, Speech Commun..

[19]  R. H. Wilson,et al.  Influence of pulsed masking on the threshold for spondees. , 1969, The Journal of the Acoustical Society of America.

[20]  DeLiang Wang,et al.  Speech segregation based on pitch tracking and amplitude modulation , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[21]  H. Dillon,et al.  The National Acoustic Laboratories' (NAL) New Procedure for Selecting the Gain and Frequency Response of a Hearing Aid , 1986, Ear and hearing.

[22]  D D Dirks,et al.  Speech recognition in amplitude-modulated noise of listeners with normal and listeners with impaired hearing. , 1995, Journal of speech and hearing research.

[23]  R. Plomp,et al.  Effect of spectral envelope smearing on speech reception. II. , 1992, The Journal of the Acoustical Society of America.

[24]  Simon King,et al.  Can Objective Measures Predict the Intelligibility of Modified HMM-Based Synthetic Speech in Noise? , 2011, INTERSPEECH.

[25]  Yi Hu,et al.  Environment-specific noise suppression for improved speech intelligibility by cochlear implant users. , 2010, The Journal of the Acoustical Society of America.

[26]  Yang Lu,et al.  An algorithm that improves speech intelligibility in noise for normal-hearing listeners. , 2009, The Journal of the Acoustical Society of America.

[27]  DeLiang Wang,et al.  A Feature Study for Classification-Based Speech Separation at Low Signal-to-Noise Ratios , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[28]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[29]  J. Oghalai,et al.  Chapter 37 – Cochlear Hearing Loss , 2005 .

[30]  DeLiang Wang,et al.  Noise Perturbation Improves Supervised Speech Separation , 2015, LVA/ICA.

[31]  Ira J. Hirsh,et al.  CX Problems Related to the Use of Speech in Clinical Audiometry , 1955, The Annals of otology, rhinology, and laryngology.

[32]  DeLiang Wang,et al.  An algorithm to improve speech recognition in noise for hearing-impaired listeners. , 2013, The Journal of the Acoustical Society of America.

[33]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[34]  D. Wang,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2008, IEEE Trans. Neural Networks.

[35]  DeLiang Wang,et al.  On Ideal Binary Mask As the Computational Goal of Auditory Scene Analysis , 2005, Speech Separation by Humans and Machines.

[36]  IEEE Recommended Practice for Speech Quality Measurements , 1969, IEEE Transactions on Audio and Electroacoustics.

[37]  W Melnick,et al.  American National Standard specifications for audiometers. , 1971, ASHA.

[38]  DeLiang Wang,et al.  Ideal ratio mask estimation using deep neural networks for robust speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.