论文信息 - Wiener Gain and Deep Neural Networks: A Well-Balanced Pair For Speech Enhancement

Wiener Gain and Deep Neural Networks: A Well-Balanced Pair For Speech Enhancement

This paper proposes a Deep Neural Network (DNN)-based Wiener gain estimator for speech enhancement. The proposal is in the framework of the classical spectral-domain speech enhancement algorithms. In this case, we used the Optimal Modified Log-Spectral Amplitude (OMLSA), but consider that this proposal could fit many alternative speech estimation algorithms. We determined the best usage of the DNN approach at learning a robust instance of the Wiener gain estimator according to the characteristics of the SNR estimation and the gain function. To design a DNN architecture adjusted for the speech enhancement task, we study various configuration issues frequently used in DNN-based solutions, including speech representations, residual connections, and causal vs. non-causal designs. Thus, we provide conclusions for the use of DNN architectures with the enhancement purpose. Experiments show that the proposal provides results on the state-of-the-art. But beyond the objective quality measures, there are examples of noisy vs. enhanced speech available for listening to demonstrate in practice the skills of the method in real audio.

Eduardo Lleida | Antonio Miguel | Alfonso Ortega | Dayana Ribas

[1] Jan Wouters,et al. Speech enhancement based on neural networks applied to cochlear implant coding strategies , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2] DeLiang Wang,et al. A deep learning algorithm to increase intelligibility for hearing-impaired listeners in the presence of a competing talker and reverberation. , 2019, The Journal of the Acoustical Society of America.

[3] DeLiang Wang,et al. Binary and ratio time-frequency masks for robust speech recognition , 2006, Speech Commun..

[4] DeLiang Wang,et al. Speech-cue transmission by an algorithm to increase consonant recognition in noise for hearing-impaired listeners. , 2014, The Journal of the Acoustical Society of America.

[5] DeLiang Wang,et al. Supervised Speech Separation Based on Deep Learning: An Overview , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[6] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[7] Eduardo Lleida,et al. Speech Enhancement with Wide Residual Networks in Reverberant Environments , 2019, INTERSPEECH.

[8] A.V. Oppenheim,et al. Enhancement and bandwidth compression of noisy speech , 1979, Proceedings of the IEEE.

[9] Israel Cohen,et al. Speech enhancement for non-stationary noise environments , 2001, Signal Process..

[10] David Malah,et al. Tracking speech-presence uncertainty to improve speech enhancement in non-stationary noise environments , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[11] DeLiang Wang,et al. An algorithm to increase speech intelligibility for hearing-impaired listeners in novel segments of the same noise type. , 2015, The Journal of the Acoustical Society of America.

[12] DeLiang Wang,et al. Ideal ratio mask estimation using deep neural networks for robust speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13] Richard M. Stern,et al. Robust signal-to-noise ratio estimation based on waveform amplitude distribution analysis , 2008, INTERSPEECH.

[14] Jesper Jensen,et al. DFT-Domain Based Single-Microphone Noise Reduction for Speech Enhancement , 2013, DFT-Domain Based Single-Microphone Noise Reduction for Speech Enhancement.

[15] S. Boll,et al. Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[16] Rainer Martin,et al. Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[17] Ming Tu,et al. Speech enhancement based on Deep Neural Networks with skip connections , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18] Eduardo Lleida,et al. Progressive Speech Enhancement with Residual Connections , 2019, INTERSPEECH.

[19] Jessica J. M. Monaghan,et al. Speech enhancement based on neural networks improves speech intelligibility in noise for cochlear implant users , 2017, Hearing Research.

[20] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[21] David Malah,et al. Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[22] DeLiang Wang,et al. Long short-term memory for speaker generalization in supervised speech separation. , 2017, The Journal of the Acoustical Society of America.

[23] DeLiang Wang,et al. Investigation of Speech Separation as a Front-End for Noise Robust Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.