A Loss With Mixed Penalty for Speech Enhancement Generative Adversarial Network

Speech enhancement based on generative adversarial networks (GANs) can overcome the problems of many classical speech enhancement methods, such as relying on the first-order statistics of signals and ignoring the phase mismatch between the noisy and the clean signals. However, GANs are hard to train and have the vanishing gradients problem which may lead to generate poor samples. In this paper, we propose a relativistic average least squares loss function with a mixed penalty term for speech enhancement generative adversarial network. The mixed penalty term can minimize the distance between generated and clean samples more effectively. Experimental results on Valentini 2016 and Valentini 2017 dataset show that the proposed loss can make the training of GAN more stable, and achieves good performance in both objective and subjective evaluation.

[1]  Antonio Bonafonte,et al.  SEGAN: Speech Enhancement Generative Adversarial Network , 2017, INTERSPEECH.

[2]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[3]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[4]  Junichi Yamagishi,et al.  Investigating RNN-based speech enhancement methods for noise-robust Text-to-Speech , 2016, SSW.

[5]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[6]  Dinei A. F. Florêncio,et al.  Speech Enhancement in Multiple-Noise Conditions Using Deep Neural Networks , 2016, INTERSPEECH.

[7]  Nobutaka Ito,et al.  The Diverse Environments Multi-channel Acoustic Noise Database (DEMAND): A database of multichannel environmental noise recordings , 2013 .

[8]  Amir Hussain,et al.  A Survey on Techniques for Enhancing Speech , 2018 .

[9]  Jesper Jensen,et al.  A short-time objective intelligibility measure for time-frequency weighted noisy speech , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Yiting Wang,et al.  Research on Speech Enhancement Based on Deep Neural Network , 2020 .

[11]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[12]  Zheng-Hua Tan,et al.  Conditional Generative Adversarial Networks for Speech Enhancement and Noise-Robust Speaker Verification , 2017, INTERSPEECH.

[13]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[14]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[15]  Alan V. Oppenheim,et al.  All-pole modeling of degraded speech , 1978 .

[16]  Raymond Y. K. Lau,et al.  Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  XuYong,et al.  A regression approach to speech enhancement based on deep neural networks , 2015 .

[18]  Deepak Baby,et al.  Sergan: Speech Enhancement Using Relativistic Generative Adversarial Networks with Gradient Penalty , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Alexia Jolicoeur-Martineau,et al.  The relativistic discriminator: a key element missing from standard GAN , 2018, ICLR.

[20]  Björn W. Schuller,et al.  Speech Enhancement with LSTM Recurrent Neural Networks and its Application to Noise-Robust ASR , 2015, LVA/ICA.

[21]  Yu Tsao,et al.  Speech enhancement based on deep denoising autoencoder , 2013, INTERSPEECH.

[22]  Richard M. Schwartz,et al.  Enhancement of speech corrupted by acoustic noise , 1979, ICASSP.

[23]  Yariv Ephraim,et al.  Statistical-model-based speech enhancement systems , 1992, Proc. IEEE.

[24]  Ting Jiang,et al.  Improved Wasserstein conditional generative adversarial network speech enhancement , 2018, EURASIP J. Wirel. Commun. Netw..

[25]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.