Speech enhancement using posterior regularized NMF with bases update

Abstract In this paper, a combination of statistical model-based approach and Non-negative Matrix Factorization (NMF)-based approach with on-line update of speech and noise bases for speech enhancement is proposed. Template-based approaches are more robust and perform better than non-stationary noises compared to statistical model-based approaches but are dependent on a priori information. Combining the approaches avoids the drawbacks of both. To improve the performance further, speech and noise bases are adapted simultaneously in NMF approach with the help of the estimated speech presence probability (SPP). The proposed method outperforms other benchmark algorithms in terms of perceptual evaluation of speech quality (PESQ) and source-to-distortion ratio (SDR) in stationary and non-stationary noise environment conditions with matched and mismatched noise basis.

[1]  Michael W. Berry,et al.  Algorithms and applications for approximate nonnegative matrix factorization , 2007, Comput. Stat. Data Anal..

[2]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[3]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[4]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[5]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[6]  Bhiksha Raj,et al.  Supervised and Semi-supervised Separation of Sounds from Single-Channel Mixtures , 2007, ICA.

[7]  Julian Eggert,et al.  Incremental Learning in the Non-negative Matrix Factorization , 2008, ICONIP.

[8]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[9]  Nam Soo Kim,et al.  Speech enhancement combining statistical models and NMF with update of speech and noise bases , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Regularized NMF-based speech enhancement with spectral components modeled by gaussian mixtures , 2014, 2014 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).

[11]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[12]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Philipos C. Loizou,et al.  A noise-estimation algorithm for highly non-stationary environments , 2006, Speech Commun..

[15]  Nam Soo Kim,et al.  NMF-Based Speech Enhancement Using Bases Update , 2015, IEEE Signal Processing Letters.

[16]  Jun Zhou,et al.  Rotational reset strategy for online semi-supervised NMF-based speech enhancement for long recordings , 2015, 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[17]  Ben Taskar,et al.  Expectation Maximization and Posterior Constraints , 2007, NIPS.

[18]  Gautham J. Mysore,et al.  An Efficient Posterior Regularized Latent Variable Model for Interactive Sound Source Separation , 2013, ICML.

[19]  Jonathan Le Roux,et al.  Non-negative dynamical system with application to speech and audio , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[20]  Yunkeun Lee,et al.  Non-negative Matrix Factorization Based Noise Reduction for Noise Robust Automatic Speech Recognition , 2012, LVA/ICA.

[21]  Meng Sun,et al.  Speech Enhancement Under Low SNR Conditions Via Noise Estimation Using Sparse and Low-Rank NMF with Kullback–Leibler Divergence , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[22]  Bhiksha Raj,et al.  Regularized non-negative matrix factorization with temporal dependencies for speech denoising , 2008, INTERSPEECH.

[23]  Rémi Gribonval,et al.  Non negative sparse representation for Wiener based source separation with a single sensor , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..