论文信息 - ISSE: an interactive source separation editor

ISSE: an interactive source separation editor

Traditional audio editing tools do not facilitate the task of separating a single mixture recording (e.g. pop song) into its respective sources (e.g. drums, vocal, etc.). Such ability, however, would be very useful for a wide variety of audio applications such as music remixing, audio denoising, and audio-based forensics. To address this issue, we present ISSE - an interactive source separation editor. ISSE is a new open-source, freely available, and cross-platform audio editing tool that enables a user to perform source separation by painting on time-frequency visualizations of sound, resulting in an interactive machine learning system. The system brings to life our previously proposed interaction paradigm and separation algorithm that learns from user-feedback to perform separation. For evaluation, we conducted user studies and compared results between inexperienced and expert users. For a variety of real-world tasks, we found that inexperienced users can achieve good separation quality with minimal instruction and expert users can achieve state-of-the-art separation quality.

Gautham J. Mysore | Ge Wang | Nicholas J. Bryan | G. Mysore | Ge Wang

[1] Emmanuel Vincent,et al. A General Flexible Framework for the Handling of Prior Information in Audio Source Separation , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[2] Bhiksha Raj,et al. A Probabilistic Latent Variable Model for Acoustic Modeling , 2006 .

[3] Gautham J. Mysore,et al. Source Separation of Polyphonic Music with Interactive User-Feedback on a Piano Roll Display , 2013, ISMIR.

[4] Perry R. Cook,et al. Real-time human interaction with supervised learning algorithms for music composition and performance , 2011 .

[5] Ratul Mahajan,et al. CueT: human-guided fast and accurate network alarm triage , 2011, CHI.

[6] Tuomas Virtanen,et al. Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[7] Gautham J. Mysore,et al. Interactive refinement of supervised and semi-supervised sound source separation estimates , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8] Alexey Ozerov,et al. Multichannel nonnegative tensor factorization with structured constraints for user-guided audio source separation , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9] Andrew McCallum,et al. Semi-Supervised Clustering with User Feedback , 2003 .

[10] Paris Smaragdis. User guided audio selection from complex sound mixtures , 2009, UIST '09.

[11] B. Raj,et al. Latent variable decomposition of spectrograms for single channel speaker separation , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[12] Nancy Bertin,et al. Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[13] Desney S. Tan,et al. EnsembleMatrix: interactive visualization to support machine learning with multiple classifiers , 2009, CHI.

[14] Gautham J. Mysore,et al. An Efficient Posterior Regularized Latent Variable Model for Interactive Sound Source Separation , 2013, ICML.

[15] Thomas G. Dietterich,et al. Toward harnessing user feedback for machine learning , 2007, IUI '07.

[16] Desney S. Tan,et al. CueFlik: interactive concept learning in image search , 2008, CHI.

[17] Rémi Gribonval,et al. Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[18] Burr Settles,et al. Closing the Loop: Fast, Interactive Semi-Supervised Annotation With Queries on Features and Instances , 2011, EMNLP.

[19] Ben Taskar,et al. Posterior Regularization for Structured Latent Variable Models , 2010, J. Mach. Learn. Res..

[20] H. Sebastian Seung,et al. Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[21] Emmanuel Vincent,et al. Subjective and Objective Quality Assessment of Audio Source Separation , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[22] Paris Smaragdis,et al. Separation by “humming”: User-guided sound extraction from monophonic mixtures , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[23] R. Lienhart,et al. Audio brush: a tool for computer-assisted smart audio editing , 2006, AMCMM '06.

[24] Steven G. Johnson,et al. The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.

[25] Jerry Alan Fails,et al. Interactive machine learning , 2003, IUI '03.

[26] Jean-Philippe Thiran,et al. Musical Audio Source Separation Based on User-Selected F0 Track , 2012, LVA/ICA.

[27] David Wessel,et al. Accelerating Non-Negative Matrix Factorization for Audio Source Separation on Multi-Core and Many-Core Architectures , 2009, ISMIR.

[28] X. Rodet,et al. Sound Analysis and Processing with AudioSculpt 2 , 2004, ICMC.

[29] P. Smaragdis,et al. Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[30] Bhiksha Raj,et al. Supervised and Semi-supervised Separation of Sounds from Single-Channel Mixtures , 2007, ICA.

[31] Francis R. Bach,et al. Semi-supervised NMF with Time-frequency Annotations for Single-channel Source Separation , 2012, ISMIR.

[32] Andrew Blake,et al. "GrabCut" , 2004, ACM Trans. Graph..