On-the-fly audio source separation

This paper addresses the challenging task of single channel audio source separation. We introduce a novel concept of on-the-fly audio source separation which greatly simplifies the user's interaction with the system compared to the state-of-the-art user-guided approaches. In the proposed framework, the user is only asked to listen to an audio mixture and type some keywords (e.g. “dog barking”, “wind”, etc.) describing the sound sources to be separated. These keywords are then used as text queries to search for audio examples from the internet to guide the separation process. In particular, we propose several approaches to efficiently exploit these retrieved examples, including an approach based on a generic spectral model with group sparsity-inducing constraints. Finally, we demonstrate the effectiveness of the proposed framework with mixtures containing various types of sounds.

[1]  Gautham J. Mysore,et al.  Interactive refinement of supervised and semi-supervised sound source separation estimates , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Francis Bach,et al.  Itakura-Saito nonnegative matrix factorization with group sparsity , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Alexey Ozerov,et al.  Text-informed audio source separation using nonnegative matrix partial co-factorization , 2013, 2013 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).

[4]  Louis Chevallier,et al.  Temporal annotation-based audio source separation using weighted nonnegative matrix factorization , 2014, 2014 IEEE Fourth International Conference on Consumer Electronics Berlin (ICCE-Berlin).

[5]  Louis Chevallier,et al.  An interactive audio source separation framework based on non-negative matrix factorization , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Bhiksha Raj,et al.  Supervised and Semi-supervised Separation of Sounds from Single-Channel Mixtures , 2007, ICA.

[7]  Fabian J. Theis,et al.  The signal separation evaluation campaign (2007-2010): Achievements and remaining challenges , 2012, Signal Process..

[8]  Gautham J. Mysore,et al.  Universal speech models for speaker independent single channel source separation , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[10]  Geert Leus,et al.  Compressed sensing for block-sparse smooth signals , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  P. Philippe,et al.  One microphone singing voice separation using source-adapted models , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[12]  Andrew Zisserman,et al.  On-the-fly specific person retrieval , 2012, 2012 13th International Workshop on Image Analysis for Multimedia Interactive Services.

[13]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[14]  Francis R. Bach,et al.  Semi-supervised NMF with Time-frequency Annotations for Single-channel Source Separation , 2012, ISMIR.

[15]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Rahim Saeidi,et al.  Group Sparsity for Speaker Identity Discrimination in Factorisation-based Speech Recognition , 2012, INTERSPEECH.

[17]  Paris Smaragdis,et al.  Separation by “humming”: User-guided sound extraction from monophonic mixtures , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[18]  Andrew Zisserman,et al.  VISOR: Towards On-the-Fly Large-Scale Object Category Retrieval , 2012, ACCV.

[19]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..