Temporal annotation-based audio source separation using weighted nonnegative matrix factorization

We consider an emerging user-guided audio source separation approach based on the temporal annotation of the source activity along the mixture. In this baseline algorithm nonnegative matrix factorization (NMF) is usually used as spectral model for audio sources. In this paper we propose two weighting strategies incorporated in the NMF formulation so as to better exploit the annotation. We then derive the corresponding multiplicative update (MU) rules for the parameter estimation. The proposed approach was objectively evaluated within the fourth community-based Signal Separation Evaluation Campaign (SiSEC 2013) and shown to outperform the baseline algorithm, while obtaining comparable result to some other state-of-the-art methods.

[1]  Ngoc Q. K. Duong,et al.  Weighted nonnegative tensor factorization: on monotonicity of multiplicative update rules and application to user-guided audio source separation , 2013 .

[2]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[3]  Paris Smaragdis,et al.  Separation by “humming”: User-guided sound extraction from monophonic mixtures , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[4]  Gautham J. Mysore,et al.  Interactive refinement of supervised and semi-supervised sound source separation estimates , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Mark D. Plumbley,et al.  Score-Informed Source Separation for Musical Audio Recordings: An overview , 2014, IEEE Signal Processing Magazine.

[6]  Francis R. Bach,et al.  Semi-supervised NMF with Time-frequency Annotations for Single-channel Source Separation , 2012, ISMIR.

[7]  Alexey Ozerov,et al.  Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Alexey Ozerov,et al.  Text-informed audio source separation using nonnegative matrix partial co-factorization , 2013, 2013 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).

[9]  Louis Chevallier,et al.  An interactive audio source separation framework based on non-negative matrix factorization , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Emmanuel Vincent,et al.  Subjective and Objective Quality Assessment of Audio Source Separation , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Alexey Ozerov,et al.  Multichannel nonnegative tensor factorization with structured constraints for user-guided audio source separation , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Bhiksha Raj,et al.  Supervised and Semi-supervised Separation of Sounds from Single-Channel Mixtures , 2007, ICA.

[14]  Tuomas Virtanen,et al.  Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Fabian J. Theis,et al.  The signal separation evaluation campaign (2007-2010): Achievements and remaining challenges , 2012, Signal Process..