Global variance equalization for improving deep neural network based speech enhancement

We address an over-smoothing issue of enhanced speech in deep neural network (DNN) based speech enhancement and propose a global variance equalization framework with two schemes, namely post-processing and post-training with modified object function for the equalization between the global variance of the estimated and the reference speech. Experimental results show that the quality of the estimated clean speech signal is improved both subjectively and objectively in terms of perceptual evaluation of speech quality (PESQ), especially in mismatch environments where the additive noise is not seen in the DNN training.

[1]  Francesco Piazza,et al.  Nonlinear Speech Enhancement: An Overview , 2005, WNSP.

[2]  Israel Cohen,et al.  Speech enhancement for non-stationary noise environments , 2001, Signal Process..

[3]  S. Tamura,et al.  An analysis of a noise reduction neural network , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[4]  Alex T. NELSONOregon Networks for Speech Enhancement , 1998 .

[5]  José L. Pérez-Córdoba,et al.  Histogram equalization of speech representation for robust speech recognition , 2005, IEEE Transactions on Speech and Audio Processing.

[6]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[7]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[8]  Jacob Benesty,et al.  Spectral Enhancement Methods , 2009 .

[9]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[10]  Jun Du,et al.  A speech enhancement approach using piecewise linear approximation of an explicit model of environmental distortions , 2008, INTERSPEECH.

[11]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[12]  Israel Cohen,et al.  Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging , 2003, IEEE Trans. Speech Audio Process..

[13]  Dirk Van Compernolle,et al.  A family of MLP based nonlinear spectral estimators for noise reduction , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Keiichi Tokuda,et al.  Spectral conversion based on maximum likelihood estimation considering global variance of converted parameter , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[15]  Changchun Bao,et al.  Speech enhancement with weighted denoising auto-encoder , 2013, INTERSPEECH.

[16]  Jun Du,et al.  An Experimental Study on Speech Enhancement Based on Deep Neural Networks , 2014, IEEE Signal Processing Letters.

[17]  DeLiang Wang,et al.  Towards Scaling Up Classification-Based Speech Separation , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .