An interactive audio source separation framework based on non-negative matrix factorization

Though audio source separation offers a wide range of applications in audio enhancement and post-production, its performance has yet to reach the satisfactory especially for single-channel mixtures with limited training data. In this paper we present a novel interactive source separation framework that allows end-users to provide feedback at each separation step so as to gradually improve the result. For this purpose, a prototype graphical user interface (GUI) is developed to help users annotating time-frequency regions where a source can be labeled as either active, inactive, or well-separated within the displayed spectrogram. This user feedback information, which is partially new with respect to the state-of-the-art annotations, is then taken into account in a proposed uncertainty-based learning algorithm to constraint the source estimates in next separation step. The considered framework is based on non-negative matrix factorization and is shown to be effective even without using any isolated training data.

[1]  Emmanuel Vincent,et al.  The 2008 Signal Separation Evaluation Campaign: A Community-Based Approach to Large-Scale Evaluation , 2009, ICA.

[2]  Shigeki Sagayama,et al.  User-guided independent vector analysis with source activity tuning , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Francis R. Bach,et al.  Semi-supervised NMF with Time-frequency Annotations for Single-channel Source Separation , 2012, ISMIR.

[4]  Gautham J. Mysore,et al.  Interactive refinement of supervised and semi-supervised sound source separation estimates , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Alexey Ozerov,et al.  Text-informed audio source separation using nonnegative matrix partial co-factorization , 2013, 2013 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).

[6]  B. Raj,et al.  Latent variable decomposition of spectrograms for single channel speaker separation , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[7]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[8]  Mathieu Lagrange,et al.  Uncertainty-based learning of acoustic models from noisy data , 2013, Comput. Speech Lang..

[9]  Tuomas Virtanen,et al.  Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Fabian J. Theis,et al.  The signal separation evaluation campaign (2007-2010): Achievements and remaining challenges , 2012, Signal Process..

[11]  Rémi Gribonval,et al.  A tractable framework for estimating and combining spectral source models for audio source separation , 2012, Signal Process..

[12]  Alexey Ozerov,et al.  Multichannel nonnegative tensor factorization with structured constraints for user-guided audio source separation , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Paris Smaragdis,et al.  Separation by “humming”: User-guided sound extraction from monophonic mixtures , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[14]  Bhiksha Raj,et al.  Supervised and Semi-supervised Separation of Sounds from Single-Channel Mixtures , 2007, ICA.