A comparative study of example-guided audio source separation approaches based on nonnegative matrix factorization

We consider example-guided audio source separation approaches, where the audio mixture to be separated is supplied with source examples that are assumed matching the sources in the mixture both in frequency and time. These approaches were successfully applied to the tasks such as source separation by humming, score-informed music source separation, and music source separation guided by covers. Most of proposed methods are based on nonnegative matrix factorization (NMF) and its variants, including methods using NMF models pre-trained from examples as an initialization of mixture NMF decomposition, methods using those models as hyperparameters of priors of mixture NMF decomposition, and methods using coupled NMF models. Moreover, those methods differ by the choice of the NMF divergence and the NMF prior. However, there is no systematic comparison of all these methods. In this work, we compare existing methods and some new variants on the score-informed and cover-guided source separation tasks.

[1]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[2]  Nathan Souviraà-Labastie,et al.  Audio source separation using multiple deformed references , 2014, 2014 22nd European Signal Processing Conference (EUSIPCO).

[3]  Minje Kim,et al.  Nonnegative Matrix Partial Co-Factorization for Spectral and Temporal Drum Source Separation , 2011, IEEE Journal of Selected Topics in Signal Processing.

[4]  Patrick Pérez,et al.  Automatic allocation of NTF components for user-guided audio source separation , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Derry FitzGerald,et al.  User Assisted Source Separation Using Non-negative Matrix Factorisation , 2011 .

[6]  Alexey Ozerov,et al.  Text-Informed Audio Source Separation. Example-Based Approach Using Non-Negative Matrix Partial Co-Factorization , 2014, Journal of Signal Processing Systems.

[7]  Patrick Pérez,et al.  Motion informed audio source separation , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Éric Gaussier,et al.  Relation between PLSA and NMF and implications , 2005, SIGIR '05.

[9]  Dalia El Badawy,et al.  On-the-Fly Audio Source Separation—A Novel User-Friendly Framework , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[10]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[11]  Gautham J. Mysore,et al.  Source Separation By Score Synthesis , 2010, ICMC.

[12]  Rémi Gribonval,et al.  From Blind to Guided Audio Source Separation: How models and side information can improve the separation of sound , 2014, IEEE Signal Processing Magazine.

[13]  Mark D. Plumbley,et al.  Score-Informed Source Separation for Musical Audio Recordings: An overview , 2014, IEEE Signal Processing Magazine.

[14]  Alexey Ozerov,et al.  Multichannel nonnegative tensor factorization with structured constraints for user-guided audio source separation , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Rémi Gribonval,et al.  Adaptation of Bayesian Models for Single-Channel Source Separation and its Application to Voice/Music Separation in Popular Songs , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Saeid Sanei,et al.  Video assisted speech source separation , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[17]  Emilia Gómez,et al.  Monaural Score-Informed Source Separation for Classical Music Using Convolutional Neural Networks , 2017, ISMIR.

[18]  Mark D. Plumbley,et al.  Score informed audio source separation using constrained nonnegative matrix factorization and score synthesis , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  Emmanuel Vincent,et al.  Multi-Channel Audio Source Separation Using Multiple Deformed References , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[20]  Paris Smaragdis,et al.  Separation by “humming”: User-guided sound extraction from monophonic mixtures , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[21]  Fabian J. Theis,et al.  The signal separation evaluation campaign (2007-2010): Achievements and remaining challenges , 2012, Signal Process..

[22]  Jérôme Idier,et al.  Algorithms for nonnegative matrix factorization with the beta-divergence , 2010, ArXiv.

[23]  Jérôme Idier,et al.  Algorithms for Nonnegative Matrix Factorization with the β-Divergence , 2010, Neural Computation.

[24]  Laurent Girin,et al.  Professionally-produced Music Separation Guided by Covers , 2012, ISMIR.

[25]  Bhiksha Raj,et al.  Supervised and Semi-supervised Separation of Sounds from Single-Channel Mixtures , 2007, ICA.

[26]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[27]  Gautham J. Mysore,et al.  ISSE: an interactive source separation editor , 2014, CHI.

[28]  Jonathan Le Roux,et al.  Discriminative NMF and its application to single-channel source separation , 2014, INTERSPEECH.

[29]  Frédéric Bimbot,et al.  Music separation guided by cover tracks: Designing the joint NMF model , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[30]  Paris Smaragdis,et al.  Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs , 2004, ICA.