Single channel source separation using smooth Nonnegative Matrix Factorization with Markov Random Fields

This paper presents a single channel source separation method based on an extension of Nonnegative Matrix Factorization (NMF) algorithm by smoothing the original posterior probabilities with an additional Markov Random Fields (MRF) structure. Our method is based on the alternative interpretation of NMF with β-divergence as latent variable models. By doing so, we can redefine NMF-based separation procedure as a Bayesian labeling problem where each label stands for the mask for a specific source. This understanding leads us to intervene in the calculation of posterior probabilities, so that the priors from MRF's neighboring structure can smooth out isolated masking values that have different labeling results from their neighbors. Experiments on several dictionary-based source separation tasks show sensible performance gains.

[1]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[2]  Rob A. Rutenbar,et al.  Stereophonic spectrogram segmentation using Markov random fields , 2012, 2012 IEEE International Workshop on Machine Learning for Signal Processing.

[3]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[4]  Bhiksha Raj,et al.  A Probabilistic Latent Variable Model for Acoustic Modeling , 2006 .

[5]  Ali Taylan Cemgil,et al.  Bayesian Inference for Nonnegative Matrix Factorisation Models , 2009, Comput. Intell. Neurosci..

[6]  Mihoko Minami,et al.  Robust Blind Source Separation by Beta Divergence , 2002, Neural Computation.

[7]  Bhiksha Raj,et al.  Non-negative Hidden Markov Modeling of Audio with Application to Source Separation , 2010, LVA/ICA.

[8]  Minje Kim,et al.  Nonnegative Matrix Partial Co-Factorization for Spectral and Temporal Drum Source Separation , 2011, IEEE Journal of Selected Topics in Signal Processing.

[9]  Paris Smaragdis,et al.  Prediction based filtering and smoothing to exploit temporal dependencies in NMF , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Andrzej Cichocki,et al.  Families of Alpha- Beta- and Gamma- Divergences: Flexible and Robust Measures of Similarities , 2010, Entropy.

[11]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[12]  C. Févotte,et al.  Automatic Relevance Determination in Nonnegative Matrix Factorization , 2009 .

[13]  P. Smaragdis,et al.  Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[14]  Tuomas Virtanen,et al.  Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[16]  Bhiksha Raj,et al.  Speech denoising using nonnegative matrix factorization with priors , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Hirokazu Kameoka,et al.  A Multipitch Analyzer Based on Harmonic Temporal Structured Clustering , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[19]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .