Training and compensation of class-conditioned NMF bases for speech enhancement

Abstract In this paper, we introduce a training and compensation algorithm of the class-conditioned basis vectors in the non-negative matrix factorization (NMF) model for single-channel speech enhancement. The main goal is to estimate the basis vectors of different signal sources in a way that prevents them from representing each other, in order to reduce the residual noise components that have features similar to the speech signal. During the proposed training stage, the basis matrices for the clean speech and noises are estimated jointly by constraining them to belong to different classes. To this end, we employ the probabilistic generative model (PGM) of classification, specified by class-conditional densities, as an a priori distribution for the basis vectors. The update rules of the NMF and the PGM parameters of classification are jointly obtained by using the variational Bayesian expectation-maximization (VBEM) algorithm, which guarantees convergence to a stationary point. Another goal of the proposed algorithm is to handle a mismatch between the characteristics of the training and test data. This is accomplished during the proposed enhancement stage, where we implement a basis compensation scheme. Specifically, we use extra free basis vectors to capture the features that are not included in the training data. Objective experimental results for different combination of speaker and noise types show that the proposed algorithm can provide better speech enhancement performance than the benchmark algorithms under various conditions.

[1]  Zi Wang,et al.  Discriminative non-negative matrix factorization for single-channel speech separation , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[3]  Anastasios Tefas,et al.  Exploiting discriminant information in nonnegative matrix factorization with application to frontal face verification , 2006, IEEE Transactions on Neural Networks.

[4]  Paris Smaragdis,et al.  Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Jonathan Le Roux,et al.  Discriminative NMF and its application to single-channel source separation , 2014, INTERSPEECH.

[6]  Hakan Erdogan,et al.  Adaptation of Speaker-Specific Bases in Non-Negative Matrix Factorization for Single Channel Speech-Music Separation , 2011, INTERSPEECH.

[7]  Nam Soo Kim,et al.  NMF-Based Speech Enhancement Using Bases Update , 2015, IEEE Signal Processing Letters.

[8]  Matthew D. Hoffman Poisson-uniform nonnegative matrix factorization , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[10]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine-mediated learning.

[11]  Hanwook Chung,et al.  Regularized non-negative matrix factorization with Gaussian mixtures and masking model for speech enhancement , 2017, Speech Commun..

[12]  Meng Sun,et al.  Speech Enhancement Under Low SNR Conditions Via Noise Estimation Using Sparse and Low-Rank NMF with Kullback–Leibler Divergence , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[13]  Paris Smaragdis,et al.  Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[14]  Tuomas Virtanen,et al.  Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Richard C. Hendriks,et al.  Unbiased MMSE-Based Noise Power Estimation With Low Complexity and Low Tracking Delay , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[17]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[18]  Hanwook Chung,et al.  Discriminative Training of NMF Model Based on Class Probabilities for Speech Enhancement , 2016, IEEE Signal Processing Letters.

[19]  Paris Smaragdis,et al.  A non-negative approach to semi-supervised separation of speech from noise with the use of temporal dynamics , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Li-Rong Dai,et al.  A Regression Approach to Speech Enhancement Based on Deep Neural Networks , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[21]  Douglas D. O'Shaughnessy,et al.  Speech communication : human and machine , 1987 .

[22]  Antony William Rix,et al.  Perceptual evaluation of speech quality (PESQ): The new ITU standard for end-to-end speech quality a , 2002 .

[23]  Paris Smaragdis,et al.  Speech Enhancement by Online Non-negative Spectrogram Decomposition in Non-stationary Noise Environments , 2012, INTERSPEECH.

[24]  Jon Barker,et al.  An analysis of environment, microphone and data simulation mismatches in robust speech recognition , 2017, Comput. Speech Lang..

[25]  Nathalie Virag,et al.  Single channel speech enhancement based on masking properties of the human auditory system , 1999, IEEE Trans. Speech Audio Process..

[26]  Emmanuel Vincent,et al.  Enforcing Harmonicity and Smoothness in Bayesian Non-Negative Matrix Factorization Applied to Polyphonic Music Transcription , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[27]  Mark D. Plumbley,et al.  Multichannel High-Resolution NMF for Modeling Convolutive Mixtures of Non-Stationary Signals in the Time-Frequency Domain , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[28]  Ilkay Ulusoy,et al.  Generative versus discriminative methods for object recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[29]  Hakan Erdogan,et al.  Discriminative nonnegative dictionary learning using cross-coherence penalties for single channel source separation , 2013, INTERSPEECH.

[30]  Ali Taylan Cemgil,et al.  Bayesian Inference for Nonnegative Matrix Factorisation Models , 2009, Comput. Intell. Neurosci..

[31]  G. K.,et al.  Learning Spectral Mapping for Speech Dereverberation and Denoising , 2017 .

[32]  Benoît Champagne,et al.  Incorporating the human hearing properties in the signal subspace approach for speech enhancement , 2003, IEEE Trans. Speech Audio Process..

[33]  Jen-Tzung Chien,et al.  Bayesian Factorization and Learning for Monaural Source Separation , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[34]  Guillermo Sapiro,et al.  Supervised non-euclidean sparse NMF via bilevel optimization with applications to speech enhancement , 2014, 2014 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA).

[35]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[37]  Nam Soo Kim,et al.  Discriminative nonnegative matrix factorization using cross-reconstruction error for source separation , 2015, INTERSPEECH.

[38]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[39]  Daniel D. Lee,et al.  Multiplicative Updates for Classification by Mixture Models , 2001, NIPS.

[40]  J. Eggert,et al.  Sparse coding and NMF , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[41]  Yunkeun Lee,et al.  Non-negative Matrix Factorization Based Noise Reduction for Noise Robust Automatic Speech Recognition , 2012, LVA/ICA.

[42]  Eric Plourde,et al.  Auditory-Based Spectral Amplitude Estimators for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[43]  Ali Taylan Cemgil,et al.  Mixtures of Gamma Priors for Non-negative Matrix Factorization Based Speech Separation , 2009, ICA.

[44]  Emmanuel Vincent,et al.  Stability Analysis of Multiplicative Update Algorithms and Application to Nonnegative Matrix Factorization , 2010, IEEE Transactions on Neural Networks.

[45]  Hanwook Chung,et al.  Basis compensation in non-negative matrix factorization model for speech enhancement , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).