A SUPERVISED MULTI-CHANNEL SPEECH ENHANCEMENT ALGORITHM BASED ON BAYESIAN NMF MODEL

In this paper, we introduce a supervised multi-channel speech enhancement algorithm based on a Bayesian multi-channel non-negative matrix factorization (MNMF) model. In the proposed framework, we consider the probabilistic generative model (PGM) of MNMF, specified by Poisson-distributed latent variables and gamma-distributed priors. In the training stage, the MNMF parameters of the speech and noise sources are estimated via the variational Bayesian expectation-maximization (VBEM) algorithm. In the enhancement stage, the clean speech signal is estimated via the MNMF-based minimum variance distortionless response (MVDR) beamformer. To further improve the enhanced speech quality, we efficiently combine the MNMF-based beamforming technique with a classical unsupervised single-channel enhancement method. Experiments show that the proposed method can provide better enhancement performance than the selected benchmarks.

[1]  Ali Taylan Cemgil,et al.  Bayesian Inference for Nonnegative Matrix Factorisation Models , 2009, Comput. Intell. Neurosci..

[2]  Jacob Benesty,et al.  A Study of the LCMV and MVDR Noise Reduction Filters , 2010, IEEE Transactions on Signal Processing.

[3]  Benedikt Loesch,et al.  Adaptive Segmentation and Separation of Determined Convolutive Mixtures under Dynamic Conditions , 2010, LVA/ICA.

[4]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[5]  O. L. Frost,et al.  An algorithm for linearly constrained adaptive array processing , 1972 .

[6]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Israel Cohen,et al.  Multichannel Eigenspace Beamforming in a Reverberant Noisy Environment With Multiple Interfering Speech Signals , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Meng Sun,et al.  Speech Enhancement Under Low SNR Conditions Via Noise Estimation Using Sparse and Low-Rank NMF with Kullback–Leibler Divergence , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[9]  Tuomas Virtanen,et al.  Ieee Transactions on Audio, Speech and Language Processing Direction of Arrival Based Spatial Covariance Model for Blind Sound Source Separation , 2022 .

[10]  Emmanuel Vincent,et al.  Multi-source TDOA estimation in reverberant audio using angular spectra and clustering , 2012, Signal Process..

[11]  Alexey Ozerov,et al.  Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Jesper Jensen,et al.  An Algorithm for Predicting the Intelligibility of Speech Masked by Modulated Noise Maskers , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[13]  Alexey Ozerov,et al.  Notes on Nonnegative Tensor Factorization of the Spectrogram for Audio Source Separation: Statistical Insights and Towards Self-Clustering of the Spatial Cues , 2010, CMMR.

[14]  X. Mestre,et al.  On diagonal loading for minimum variance beamformers , 2003, Proceedings of the 3rd IEEE International Symposium on Signal Processing and Information Technology (IEEE Cat. No.03EX795).

[15]  Nam Soo Kim,et al.  NMF-Based Speech Enhancement Using Bases Update , 2015, IEEE Signal Processing Letters.

[16]  Hirokazu Kameoka,et al.  Multichannel Extensions of Non-Negative Matrix Factorization With Complex-Valued Data , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Stephan Gerlach,et al.  Combination of MVDR beamforming and single-channel spectral processing for enhancing noisy and reverberant speech , 2015, EURASIP J. Adv. Signal Process..

[18]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[19]  Paris Smaragdis,et al.  Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  J. Eggert,et al.  Sparse coding and NMF , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[21]  Richard C. Hendriks,et al.  Noise Correlation Matrix Estimation for Multi-Microphone Speech Enhancement , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Richard C. Hendriks,et al.  Unbiased MMSE-Based Noise Power Estimation With Low Complexity and Low Tracking Delay , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  Jon Barker,et al.  An analysis of environment, microphone and data simulation mismatches in robust speech recognition , 2017, Comput. Speech Lang..

[24]  Hanwook Chung,et al.  Training and compensation of class-conditioned NMF bases for speech enhancement , 2018, Neurocomputing.

[25]  Mark D. Plumbley,et al.  Multichannel High-Resolution NMF for Modeling Convolutive Mixtures of Non-Stationary Signals in the Time-Frequency Domain , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[26]  Reinhold Häb-Umbach,et al.  Speech Enhancement With a GSC-Like Structure Employing Eigenvector-Based Transfer Function Ratios Estimation , 2011, IEEE Transactions on Audio, Speech, and Language Processing.