Phoneme-Dependent NMF for Speech Enhancement in Monaural Mixtures

The problem of separating speech signals out of monaural mixtures (with other non-speech or speech signals) has become increasingly popular in recent times. Among the various solutions proposed, the most popular methods are based on compositional models such as non-negative matrix factorization (NMF) and latent variable models. Although these techniques are highly effective they largely ignore the inherently phonetic nature of speech. In this paper we present a phoneme-dependent NMFbased algorithm to separate speech from monaural mixtures. Experiments performed on speech mixed with music indicate that the proposed algorithm can result in significant improvement in separation performance, over conventional NMF-based separation.

[1]  Bhiksha Raj,et al.  Non-negative Hidden Markov Modeling of Audio with Application to Source Separation , 2010, LVA/ICA.

[2]  Masataka Goto,et al.  RWC Music Database: Music genre database and musical instrument sound database , 2003, ISMIR.

[3]  Paris Smaragdis,et al.  Convolutive Speech Bases and Their Application to Supervised Speech Separation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Tuomas Virtanen,et al.  Noise robust exemplar-based connected digit recognition , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Bhiksha Raj,et al.  Non-negative matrix factorization based compensation of music for automatic speech recognition , 2010, INTERSPEECH.

[6]  Bhiksha Raj,et al.  Sparse Overcomplete Decomposition for Single Channel Speaker Separation , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[7]  Bhiksha Raj,et al.  A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds , 2009, NIPS.

[8]  Bhiksha Raj,et al.  Probabilistic Latent Variable Models as Nonnegative Factorizations , 2008, Comput. Intell. Neurosci..

[9]  Jan Larsen,et al.  Single-channel source separation using non-negative matrix factorization , 2009 .

[10]  T. Virtanen Monaural Sound Source Separation by Perceptually Weighted Non-Negative Matrix Factorization , 2003 .

[11]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.