Wiener Gain and Deep Neural Networks: A Well-Balanced Pair For Speech Enhancement

This paper proposes a Deep Neural Network (DNN)-based Wiener gain estimator for speech enhancement. The proposal is in the framework of the classical spectral-domain speech enhancement algorithms. In this case, we used the Optimal Modified Log-Spectral Amplitude (OMLSA), but consider that this proposal could fit many alternative speech estimation algorithms. We determined the best usage of the DNN approach at learning a robust instance of the Wiener gain estimator according to the characteristics of the SNR estimation and the gain function. To design a DNN architecture adjusted for the speech enhancement task, we study various configuration issues frequently used in DNN-based solutions, including speech representations, residual connections, and causal vs. non-causal designs. Thus, we provide conclusions for the use of DNN architectures with the enhancement purpose. Experiments show that the proposal provides results on the state-of-the-art. But beyond the objective quality measures, there are examples of noisy vs. enhanced speech available for listening to demonstrate in practice the skills of the method in real audio.

[1]  Jan Wouters,et al.  Speech enhancement based on neural networks applied to cochlear implant coding strategies , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  DeLiang Wang,et al.  A deep learning algorithm to increase intelligibility for hearing-impaired listeners in the presence of a competing talker and reverberation. , 2019, The Journal of the Acoustical Society of America.

[3]  DeLiang Wang,et al.  Binary and ratio time-frequency masks for robust speech recognition , 2006, Speech Commun..

[4]  DeLiang Wang,et al.  Speech-cue transmission by an algorithm to increase consonant recognition in noise for hearing-impaired listeners. , 2014, The Journal of the Acoustical Society of America.

[5]  DeLiang Wang,et al.  Supervised Speech Separation Based on Deep Learning: An Overview , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[6]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[7]  Eduardo Lleida,et al.  Speech Enhancement with Wide Residual Networks in Reverberant Environments , 2019, INTERSPEECH.

[8]  A.V. Oppenheim,et al.  Enhancement and bandwidth compression of noisy speech , 1979, Proceedings of the IEEE.

[9]  Israel Cohen,et al.  Speech enhancement for non-stationary noise environments , 2001, Signal Process..

[10]  David Malah,et al.  Tracking speech-presence uncertainty to improve speech enhancement in non-stationary noise environments , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[11]  DeLiang Wang,et al.  An algorithm to increase speech intelligibility for hearing-impaired listeners in novel segments of the same noise type. , 2015, The Journal of the Acoustical Society of America.

[12]  DeLiang Wang,et al.  Ideal ratio mask estimation using deep neural networks for robust speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Richard M. Stern,et al.  Robust signal-to-noise ratio estimation based on waveform amplitude distribution analysis , 2008, INTERSPEECH.

[14]  Jesper Jensen,et al.  DFT-Domain Based Single-Microphone Noise Reduction for Speech Enhancement , 2013, DFT-Domain Based Single-Microphone Noise Reduction for Speech Enhancement.

[15]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[16]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[17]  Ming Tu,et al.  Speech enhancement based on Deep Neural Networks with skip connections , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Eduardo Lleida,et al.  Progressive Speech Enhancement with Residual Connections , 2019, INTERSPEECH.

[19]  Jessica J. M. Monaghan,et al.  Speech enhancement based on neural networks improves speech intelligibility in noise for cochlear implant users , 2017, Hearing Research.

[20]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[22]  DeLiang Wang,et al.  Long short-term memory for speaker generalization in supervised speech separation. , 2017, The Journal of the Acoustical Society of America.

[23]  DeLiang Wang,et al.  Investigation of Speech Separation as a Front-End for Noise Robust Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.