A Priori SNR Estimation Using Discriminative Non-Negative Matrix Factorization

A priori signal-to-noise ratio (SNR) contains critical information about the single-channel mixture of a speech and noise signal, and can be used by speech enhancement algorithms. In this paper, we propose a novel a priori SNR estimator using the estimates obtained from discriminative non-negative matrix factorization (DNMF). The idea of our new approach is to utilize the DNMF to perform the preliminary speech components estimation, which can be either directly used to estimate the a priori SNR, or can be combined with the well-known decision-directed (DD) approach by Ephraim and Malah to perform the a priori SNR estimation. We present a speaker-independent but noise-dependent DNMF-based a priori SNR estimator. Speech enhancement simulation results in the presence of non-stationary noise validate our new approach combined with well-known spectral weighting rules, outperforming several NMF-based and non-NMF-based state-of-the-art methods, w.r.t. both SNR improvement and speech perceptual quality.

[1]  Emad M. Grais,et al.  Single channel speech music separation using nonnegative matrix factorization and spectral masks , 2011, 2011 17th International Conference on Digital Signal Processing (DSP).

[2]  Tim Fingscheidt,et al.  A Data-Driven Approach to A Priori SNR Estimation , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[4]  Jonathan Le Roux,et al.  Discriminative NMF and its application to single-channel source separation , 2014, INTERSPEECH.

[5]  Huajun Yu,et al.  Post-Filter Optimization for Multichannel Automotive Speech Enhancement , 2013 .

[6]  Patrik O. Hoyer,et al.  Non-negative sparse coding , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[7]  Paris Smaragdis,et al.  An adaptive time-frequency resolution approach for Non-negative Matrix Factorization based single channel sound source separation , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Wouter Tirry,et al.  Instantaneous A Priori SNR Estimation by Cepstral Excitation Manipulation , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[9]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[10]  Chng Eng Siong,et al.  Combining non-negative matrix factorization and deep neural networks for speech enhancement and automatic speech recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Israel Cohen,et al.  Speech enhancement using super-Gaussian speech models and noncausal a priori SNR estimation , 2005, Speech Commun..

[12]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[13]  Jon Barker,et al.  An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.

[14]  Wouter Tirry,et al.  An iterative speech model-based a priori SNR estimator , 2015, INTERSPEECH.

[15]  Jon Barker,et al.  The third ‘CHiME’ speech separation and recognition challenge: Dataset, task and baselines , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[16]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[17]  Tim Fingscheidt,et al.  Environment-Optimized Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Pascal Scalart,et al.  Speech enhancement based on a priori signal to noise estimation , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[19]  Peter Vary,et al.  Speech Enhancement by MAP Spectral Amplitude Estimation Using a Super-Gaussian Speech Model , 2005, EURASIP J. Adv. Signal Process..

[20]  Zi Wang,et al.  Discriminative non-negative matrix factorization for single-channel speech separation , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Arne Leijon,et al.  A new linear MMSE filter for single channel speech enhancement based on Nonnegative Matrix Factorization , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).