ISSE: an interactive source separation editor

Traditional audio editing tools do not facilitate the task of separating a single mixture recording (e.g. pop song) into its respective sources (e.g. drums, vocal, etc.). Such ability, however, would be very useful for a wide variety of audio applications such as music remixing, audio denoising, and audio-based forensics. To address this issue, we present ISSE - an interactive source separation editor. ISSE is a new open-source, freely available, and cross-platform audio editing tool that enables a user to perform source separation by painting on time-frequency visualizations of sound, resulting in an interactive machine learning system. The system brings to life our previously proposed interaction paradigm and separation algorithm that learns from user-feedback to perform separation. For evaluation, we conducted user studies and compared results between inexperienced and expert users. For a variety of real-world tasks, we found that inexperienced users can achieve good separation quality with minimal instruction and expert users can achieve state-of-the-art separation quality.

[1]  Emmanuel Vincent,et al.  A General Flexible Framework for the Handling of Prior Information in Audio Source Separation , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Bhiksha Raj,et al.  A Probabilistic Latent Variable Model for Acoustic Modeling , 2006 .

[3]  Gautham J. Mysore,et al.  Source Separation of Polyphonic Music with Interactive User-Feedback on a Piano Roll Display , 2013, ISMIR.

[4]  Perry R. Cook,et al.  Real-time human interaction with supervised learning algorithms for music composition and performance , 2011 .

[5]  Ratul Mahajan,et al.  CueT: human-guided fast and accurate network alarm triage , 2011, CHI.

[6]  Tuomas Virtanen,et al.  Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Gautham J. Mysore,et al.  Interactive refinement of supervised and semi-supervised sound source separation estimates , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Alexey Ozerov,et al.  Multichannel nonnegative tensor factorization with structured constraints for user-guided audio source separation , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Andrew McCallum,et al.  Semi-Supervised Clustering with User Feedback , 2003 .

[10]  Paris Smaragdis User guided audio selection from complex sound mixtures , 2009, UIST '09.

[11]  B. Raj,et al.  Latent variable decomposition of spectrograms for single channel speaker separation , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[12]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[13]  Desney S. Tan,et al.  EnsembleMatrix: interactive visualization to support machine learning with multiple classifiers , 2009, CHI.

[14]  Gautham J. Mysore,et al.  An Efficient Posterior Regularized Latent Variable Model for Interactive Sound Source Separation , 2013, ICML.

[15]  Thomas G. Dietterich,et al.  Toward harnessing user feedback for machine learning , 2007, IUI '07.

[16]  Desney S. Tan,et al.  CueFlik: interactive concept learning in image search , 2008, CHI.

[17]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Burr Settles,et al.  Closing the Loop: Fast, Interactive Semi-Supervised Annotation With Queries on Features and Instances , 2011, EMNLP.

[19]  Ben Taskar,et al.  Posterior Regularization for Structured Latent Variable Models , 2010, J. Mach. Learn. Res..

[20]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[21]  Emmanuel Vincent,et al.  Subjective and Objective Quality Assessment of Audio Source Separation , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Paris Smaragdis,et al.  Separation by “humming”: User-guided sound extraction from monophonic mixtures , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[23]  R. Lienhart,et al.  Audio brush: a tool for computer-assisted smart audio editing , 2006, AMCMM '06.

[24]  Steven G. Johnson,et al.  The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.

[25]  Jerry Alan Fails,et al.  Interactive machine learning , 2003, IUI '03.

[26]  Jean-Philippe Thiran,et al.  Musical Audio Source Separation Based on User-Selected F0 Track , 2012, LVA/ICA.

[27]  David Wessel,et al.  Accelerating Non-Negative Matrix Factorization for Audio Source Separation on Multi-Core and Many-Core Architectures , 2009, ISMIR.

[28]  X. Rodet,et al.  Sound Analysis and Processing with AudioSculpt 2 , 2004, ICMC.

[29]  P. Smaragdis,et al.  Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[30]  Bhiksha Raj,et al.  Supervised and Semi-supervised Separation of Sounds from Single-Channel Mixtures , 2007, ICA.

[31]  Francis R. Bach,et al.  Semi-supervised NMF with Time-frequency Annotations for Single-channel Source Separation , 2012, ISMIR.

[32]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..