论文信息 - A SUPERVISED MULTI-CHANNEL SPEECH ENHANCEMENT ALGORITHM BASED ON BAYESIAN NMF MODEL

A SUPERVISED MULTI-CHANNEL SPEECH ENHANCEMENT ALGORITHM BASED ON BAYESIAN NMF MODEL

In this paper, we introduce a supervised multi-channel speech enhancement algorithm based on a Bayesian multi-channel non-negative matrix factorization (MNMF) model. In the proposed framework, we consider the probabilistic generative model (PGM) of MNMF, specified by Poisson-distributed latent variables and gamma-distributed priors. In the training stage, the MNMF parameters of the speech and noise sources are estimated via the variational Bayesian expectation-maximization (VBEM) algorithm. In the enhancement stage, the clean speech signal is estimated via the MNMF-based minimum variance distortionless response (MVDR) beamformer. To further improve the enhanced speech quality, we efficiently combine the MNMF-based beamforming technique with a classical unsupervised single-channel enhancement method. Experiments show that the proposed method can provide better enhancement performance than the selected benchmarks.

Hanwook Chung | Eric Plourde | Benoît Champagne

[1] Ali Taylan Cemgil,et al. Bayesian Inference for Nonnegative Matrix Factorisation Models , 2009, Comput. Intell. Neurosci..

[2] Jacob Benesty,et al. A Study of the LCMV and MVDR Noise Reduction Filters , 2010, IEEE Transactions on Signal Processing.

[3] Benedikt Loesch,et al. Adaptive Segmentation and Separation of Determined Convolutive Mixtures under Dynamic Conditions , 2010, LVA/ICA.

[4] H. Sebastian Seung,et al. Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[5] O. L. Frost,et al. An algorithm for linearly constrained adaptive array processing , 1972 .

[6] Rémi Gribonval,et al. Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[7] Israel Cohen,et al. Multichannel Eigenspace Beamforming in a Reverberant Noisy Environment With Multiple Interfering Speech Signals , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[8] Meng Sun,et al. Speech Enhancement Under Low SNR Conditions Via Noise Estimation Using Sparse and Low-Rank NMF with Kullback–Leibler Divergence , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[9] Tuomas Virtanen,et al. Ieee Transactions on Audio, Speech and Language Processing Direction of Arrival Based Spatial Covariance Model for Blind Sound Source Separation , 2022 .

[10] Emmanuel Vincent,et al. Multi-source TDOA estimation in reverberant audio using angular spectra and clustering , 2012, Signal Process..

[11] Alexey Ozerov,et al. Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[12] Jesper Jensen,et al. An Algorithm for Predicting the Intelligibility of Speech Masked by Modulated Noise Maskers , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[13] Alexey Ozerov,et al. Notes on Nonnegative Tensor Factorization of the Spectrogram for Audio Source Separation: Statistical Insights and Towards Self-Clustering of the Spatial Cues , 2010, CMMR.

[14] X. Mestre,et al. On diagonal loading for minimum variance beamformers , 2003, Proceedings of the 3rd IEEE International Symposium on Signal Processing and Information Technology (IEEE Cat. No.03EX795).

[15] Nam Soo Kim,et al. NMF-Based Speech Enhancement Using Bases Update , 2015, IEEE Signal Processing Letters.

[16] Hirokazu Kameoka,et al. Multichannel Extensions of Non-Negative Matrix Factorization With Complex-Valued Data , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[17] Stephan Gerlach,et al. Combination of MVDR beamforming and single-channel spectral processing for enhancing noisy and reverberant speech , 2015, EURASIP J. Adv. Signal Process..

[18] Ephraim. Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[19] Paris Smaragdis,et al. Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[20] J. Eggert,et al. Sparse coding and NMF , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[21] Richard C. Hendriks,et al. Noise Correlation Matrix Estimation for Multi-Microphone Speech Enhancement , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[22] Richard C. Hendriks,et al. Unbiased MMSE-Based Noise Power Estimation With Low Complexity and Low Tracking Delay , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[23] Jon Barker,et al. An analysis of environment, microphone and data simulation mismatches in robust speech recognition , 2017, Comput. Speech Lang..

[24] Hanwook Chung,et al. Training and compensation of class-conditioned NMF bases for speech enhancement , 2018, Neurocomputing.

[25] Mark D. Plumbley,et al. Multichannel High-Resolution NMF for Modeling Convolutive Mixtures of Non-Stationary Signals in the Time-Frequency Domain , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[26] Reinhold Häb-Umbach,et al. Speech Enhancement With a GSC-Like Structure Employing Eigenvector-Based Transfer Function Ratios Estimation , 2011, IEEE Transactions on Audio, Speech, and Language Processing.