论文信息 - Machine Learning Approach for Improving the Intelligibility of Noisy Speech

Machine Learning Approach for Improving the Intelligibility of Noisy Speech

Recently machine learning based speech enhancement approaches have shown immense promise to improve the intelligibility of noisy speech for both normal hearing and hearing impaired listeners. In this paper we study speech intelligibility potential of the single-microphone speech enhancement based on Deep Neural Networks (DNNs), a part of machine learning family. We have shown that DNN based speech enhancement approach, once trained purposely to handle many types of noise and signal-to-noise ratios (SNRs), shown immense potential of attaining large speech intelligibility improvements. The deep neural network models are trained to learn mapping from the noisy speech features and the coefficients of ratio time-frequency masks are estimated. The estimated masks are applied to noisy speech magnitude spectra in order to attain an enhanced intelligibility speech by utilizing the phase of noisy speech. The results at many different noisy conditions including exhibition hall, coffee shop, airport, car and babble and five SNRs: −10dB, −5dB, 0dB, 5dB and 10dB reported that deep neural network-based ratio mask outperformed the competing methods including Nonnegative matrix factorization (NMF) and log minimum mean square error (LMMSE) estimation in terms of the short time objective intelligibility (STOI) and Normalized subband envelope correlation (NSEC) objective speech intelligibility metrics.

Nasir Saleem | Muhammad Irfan Khattak | Sheeraz Ahmad | Muhammad Ismail Mohmand | Muhammad Yousaf Ali

[1] DeLiang Wang,et al. On Training Targets for Supervised Speech Separation , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[2] Muhammad Shafi,et al. Unsupervised speech enhancement in low SNR environments via sparseness and temporal gradient regularization , 2018 .

[3] Rainer Martin,et al. Speech enhancement based on minimum mean-square error estimation and supergaussian priors , 2005, IEEE Transactions on Speech and Audio Processing.

[4] Adnan Khan,et al. Ideal binary masking for reducing convolutive noise , 2015, Int. J. Speech Technol..

[5] Yongqiang Wang,et al. An investigation of deep neural networks for noise robust speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6] Pascal Scalart,et al. Speech enhancement based on a priori signal to noise estimation , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[7] IEEE Recommended Practice for Speech Quality Measurements , 1969, IEEE Transactions on Audio and Electroacoustics.

[8] Jesper Jensen,et al. A short-time objective intelligibility measure for time-frequency weighted noisy speech , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9] David Malah,et al. Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[10] Yang Lu,et al. An algorithm that improves speech intelligibility in noise for normal-hearing listeners. , 2009, The Journal of the Acoustical Society of America.

[11] DeLiang Wang,et al. A classification based approach to speech segregation. , 2012, The Journal of the Acoustical Society of America.

[12] DeLiang Wang,et al. Large-scale training to increase speech intelligibility for hearing-impaired listeners in novel noises. , 2016, The Journal of the Acoustical Society of America.

[13] Jesper Jensen,et al. Minimum Mean-Square Error Estimation of Discrete Fourier Coefficients With Generalized Gamma Priors , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[14] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[15] Paris Smaragdis,et al. Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[16] M. Shafi,et al. A Novel Binary Mask Estimation based on Spectral Subtraction Gain-Induced Distortions for Improved Speech Intelligibility and Quality , 2015 .

[17] Jesper Jensen,et al. DFT-Domain Based Single-Microphone Noise Reduction for Speech Enhancement , 2013, DFT-Domain Based Single-Microphone Noise Reduction for Speech Enhancement.

[18] DeLiang Wang,et al. An algorithm to increase speech intelligibility for hearing-impaired listeners in novel segments of the same noise type. , 2015, The Journal of the Acoustical Society of America.

[19] Daniel P. W. Ellis,et al. A simple correlation-based model of intelligibility for nonlinear speech enhancement and separation , 2009, 2009 17th European Signal Processing Conference.

[20] Li-Rong Dai,et al. A Regression Approach to Speech Enhancement Based on Deep Neural Networks , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[21] Nasir Saleem,et al. Single channel noise reduction system in low SNR , 2017, Int. J. Speech Technol..

[22] Nasir Saleem,et al. Noise Reduction Based on Soft Masks by Incorporating SNR Uncertainty in Frequency Domain , 2018, Circuits Syst. Signal Process..