Blind audio source separation of stereo mixtures using Bayesian Non-negative Matrix Factorization

In this paper, a novel approach is proposed for estimating the number of sources and for source separation in convolutive audio stereo mixtures. First, an angular spectrum-based method is applied to count and locate the sources. A nonlinear GCC-PHAT metric is exploited for this purpose. The estimated channel coefficients are then utilized to obtain a primary estimate of the source spectrograms through binary masking. Afterwards, the individual spectrograms are decomposed using a Bayesian NMF approach. This way, the number of components required for modeling each source is inferred based on data. These factors are then utilized as initial values for the EM algorithm which maximizes the joint likelihood of the 2-channel data to extract the individual source signals. It is shown that this initialization scheme can greatly improve the performance of the source separation over random initialization. The experiments are performed on synthetic mixtures of speech and music signals.

[1]  Tuomas Virtanen,et al.  Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Hugo Van hamme,et al.  Model order estimation using Bayesian NMF for discovering phone patterns in spoken utterances , 2013, INTERSPEECH.

[3]  Alexey Ozerov,et al.  Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Benedikt Loesch,et al.  Blind Source Separation Based on Time-Frequency Sparseness in the Presence of Spatial Aliasing , 2010, LVA/ICA.

[6]  P. Smaragdis,et al.  Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[7]  Emmanuel Vincent,et al.  The 2008 Signal Separation Evaluation Campaign: A Community-Based Approach to Large-Scale Evaluation , 2009, ICA.

[8]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[9]  Dan Barry,et al.  Clustering NMF basis functions using Shifted NMF for monaural sound source separation , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[11]  Alexey Ozerov,et al.  Notes on Nonnegative Tensor Factorization of the Spectrogram for Audio Source Separation: Statistical Insights and Towards Self-Clustering of the Spatial Cues , 2010, CMMR.

[12]  V. G. Reju,et al.  Underdetermined Convolutive Blind Source Separation via Time–Frequency Masking , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Hirokazu Kameoka,et al.  Complex NMF: A new sparse representation for acoustic signals , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Hirokazu Kameoka,et al.  Underdetermined BSS with multichannel complex NMF assuming W-disjoint orthogonality of source , 2011, TENCON 2011 - 2011 IEEE Region 10 Conference.