Sparse NMF based speech enhancement with bases update

In this paper, a combination of methods based on statistical modelling and Non-negative Matrix Factorization (NMF) for speech enhancement using speech and noise bases with on-line update is proposed. Template-based approaches are known to be more robust in the presence of non-stationary noises than methods based on statistical modeling. However, template-based approaches depend on a-priori information. The drawbacks of both the approaches can be avoided by combining them. In NMF approach, speech bases and noise bases are simultaneously adapted to further improve the performance. The proposed method outperforms other benchmark algorithms in terms of perceptual evaluation of speech quality (PESQ) and source-to-distortion ratio (SDR) in stationary and non-stationary noise environment conditions with matched and mismatched noise basis.

[1]  Nam Soo Kim,et al.  NMF-Based Speech Enhancement Using Bases Update , 2015, IEEE Signal Processing Letters.

[2]  Paris Smaragdis,et al.  Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[4]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[5]  Bhiksha Raj,et al.  A Probabilistic Latent Variable Model for Acoustic Modeling , 2006 .

[6]  J. Larsen,et al.  Wind Noise Reduction using Non-Negative Sparse Coding , 2007, 2007 IEEE Workshop on Machine Learning for Signal Processing.

[7]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Philipos C. Loizou,et al.  A noise-estimation algorithm for highly non-stationary environments , 2006, Speech Commun..

[9]  P. Smaragdis,et al.  Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[10]  B. Raj,et al.  Latent variable decomposition of spectrograms for single channel speaker separation , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[11]  Bhiksha Raj,et al.  Regularized non-negative matrix factorization with temporal dependencies for speech denoising , 2008, INTERSPEECH.

[12]  Tuomas Virtanen,et al.  Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Shih-Chii Liu,et al.  Linear Methods for Efficient and Fast Separation of Two Sources Recorded with a Single Microphone , 2015, Neural Computation.

[14]  Rémi Gribonval,et al.  Non negative sparse representation for Wiener based source separation with a single sensor , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[15]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[16]  Julian Eggert,et al.  Incremental Learning in the Non-negative Matrix Factorization , 2008, ICONIP.

[17]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[18]  Bhiksha Raj,et al.  Supervised and Semi-supervised Separation of Sounds from Single-Channel Mixtures , 2007, ICA.

[19]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  Arne Leijon,et al.  A new linear MMSE filter for single channel speech enhancement based on Nonnegative Matrix Factorization , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[21]  Michael W. Berry,et al.  Algorithms and applications for approximate nonnegative matrix factorization , 2007, Comput. Stat. Data Anal..

[22]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[23]  Nam Soo Kim,et al.  Speech enhancement combining statistical models and NMF with update of speech and noise bases , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  Jonathan Le Roux,et al.  Non-negative dynamical system with application to speech and audio , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.