An Overview of Lead and Accompaniment Separation in Music

Popular music is often composed of an accompaniment and a lead component, the latter typically consisting of vocals. Filtering such mixtures to extract one or both components has many applications, such as automatic karaoke and remixing. This particular case of source separation yields very specific challenges and opportunities, including the particular complexity of musical structures, but also relevant prior knowledge coming from acoustics, musicology or sound engineering. Due to both its importance in applications and its challenging difficulty, lead and accompaniment separation has been a popular topic in signal processing for decades. In this article, we provide a comprehensive review of this research topic, organizing the different approaches according to whether they are model-based or data-centered. For model-based methods, we organize them according to whether they concentrate on the lead signal, the accompaniment, or both. For data-centered approaches, we discuss the particular difficulty of obtaining data for learning lead separation systems, and then review recent approaches, notably those based on deep learning. Finally, we discuss the delicate problem of evaluating the quality of music separation through adequate metrics and present the results of the largest evaluation, to-date, of lead and accompaniment separation systems. In conjunction with the above, a comprehensive list of references is provided, along with relevant pointers to available implementations and repositories.

[1]  Gunnar Fant,et al.  Acoustic Theory Of Speech Production , 1960 .

[2]  B. P. Bogert,et al.  The quefrency analysis of time series for echoes : cepstrum, pseudo-autocovariance, cross-cepstrum and saphe cracking , 1963 .

[3]  A Tikhonov,et al.  Solution of Incorrectly Formulated Problems and the Regularization Method , 1963 .

[4]  Norbert Wiener,et al.  Extrapolation, Interpolation, and Smoothing of Stationary Time Series , 1964 .

[5]  A. Noll Short‐Time Spectrum and “Cepstrum” Techniques for Vocal‐Pitch Detection , 1964 .

[6]  A. Noll Cepstrum pitch determination. , 1967, The Journal of the Acoustical Society of America.

[7]  A. Oppenheim,et al.  Homomorphic analysis of speech , 1968 .

[8]  A. Oppenheim Speech analysis-synthesis system based on homomorphic filtering. , 1969, The Journal of the Acoustical Society of America.

[9]  James M. Ortega,et al.  Iterative solution of nonlinear equations in several variables , 2014, Computer science and applied mathematics.

[10]  Neil Joseph Miller Removal of Noise from a Voice Signal by Synthesis , 1973 .

[11]  J.A. Moorer,et al.  Signal processing aspects of computer music: A survey , 1977, Proceedings of the IEEE.

[12]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[13]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[14]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[15]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[16]  Ernst Terhardt,et al.  Calculating virtual pitch , 1979, Hearing Research.

[17]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[18]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[19]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[20]  Julius O. Smith,et al.  PARSHL: An Analysis/Synthesis Program for Non-Harmonic Sounds Based on a Sinusoidal Representation , 1987, ICMC.

[21]  D. J. Hermes,et al.  Measurement of pitch by subharmonic summation. , 1988, The Journal of the Acoustical Society of America.

[22]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[23]  Robert C. Maher,et al.  An approach for the separation of voices in composite musical signals , 1989 .

[24]  D. Klatt,et al.  Analysis, synthesis, and perception of voice quality variations among female and male talkers. , 1990, The Journal of the Acoustical Society of America.

[25]  R. Durrett Probability: Theory and Examples , 1993 .

[26]  Hugo Fastl,et al.  Psychoacoustics: Facts and Models , 1990 .

[27]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[28]  B. Yegnanarayana,et al.  Processing of noisy speech using modified group delay functions , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[29]  Judith C. Brown Calculation of a constant Q spectral transform , 1991 .

[30]  Anil K. Jain,et al.  Unsupervised texture segmentation using Gabor filters , 1990, 1990 IEEE International Conference on Systems, Man, and Cybernetics Conference Proceedings.

[31]  Judith C. Brown,et al.  An efficient algorithm for the calculation of a constant Q transform , 1992 .

[32]  Thomas F. Quatieri,et al.  Shape invariant time-scale and pitch modification of speech , 1992, IEEE Trans. Signal Process..

[33]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[34]  P. Boersma ACCURATE SHORT-TERM ANALYSIS OF THE FUNDAMENTAL FREQUENCY AND THE HARMONICS-TO-NOISE RATIO OF A SAMPLED SOUND , 1993 .

[35]  Jonathan G. Fiscus,et al.  DARPA TIMIT:: acoustic-phonetic continuous speech corpus CD-ROM, NIST speech disc 1-1.1 , 1993 .

[36]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[37]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[38]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[39]  Richard F. Lyon,et al.  Auditory model inversion for sound separation , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[40]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[41]  Avery Li-Chun Wang,et al.  Instantaneous and frequency-warped signal processing techniques for auditory source separation , 1994 .

[42]  A. L. Wang Instantaneous and frequency-warped techniques for source separation and signal parametrization , 1995, Proceedings of 1995 Workshop on Applications of Signal Processing to Audio and Accoustics.

[43]  Pascal Scalart,et al.  Speech enhancement based on a priori signal to noise estimation , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[44]  Xavier Serra,et al.  Musical Sound Modeling with Sinusoids plus Noise , 1997 .

[45]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[46]  Xavier Rodet Musical Sound Signal Analysis/Synthesis: Sinusoidal+Residual and Elementary Waveform Models , 1997 .

[47]  Yinong Ding,et al.  Processing of Musical Tones Using a Combined Quadratic Polynomial-Phase Sinusoid and Residual (QUASAR) Signal Model , 1997 .

[48]  Keikichi Hirose,et al.  Separation of singing and piano sounds , 1998, ICSLP.

[49]  Kevin M. Buckley,et al.  Beamforming Techniques for Spatial Filtering , 1999 .

[50]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[51]  Jonathan Foote,et al.  Visualizing music and audio using self-similarity , 1999, MULTIMEDIA '99.

[52]  Adrian E. Raftery,et al.  Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors , 1999 .

[53]  Aapo Hyvärinen,et al.  Fast and robust fixed-point algorithms for independent component analysis , 1999, IEEE Trans. Neural Networks.

[54]  S. Shamma,et al.  Spectro-temporal modulation transfer functions and speech intelligibility. , 1999, The Journal of the Acoustical Society of America.

[55]  Ravi Kalakota,et al.  e-Business 2.0: Roadmap for Success , 2000 .

[56]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[57]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[58]  Sam T. Roweis,et al.  One Microphone Source Separation , 2000, NIPS.

[59]  Michael A. Casey,et al.  Separation of Mixed Audio Sources By Independent Subspace Analysis , 2000, ICMC.

[60]  William M. Hartmann,et al.  Psychoacoustics: Facts and Models , 2001 .

[61]  Shingo Uchihashi,et al.  The beat spectrum: a new approach to rhythm analysis , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[62]  C. Bernard Discrete Wavelet Analysis for Fast Optic Flow Computation , 2001 .

[63]  Andries P. Hekstra,et al.  Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[64]  Bernard C. Y. Tan,et al.  The Internet is changing the music industry , 2001, Commun. ACM.

[65]  Shlomo Dubnov,et al.  Robust temporal and spectral modeling for query By melody , 2002, SIGIR '02.

[66]  Carlos Avendano,et al.  Frequency Domain Techniques for Stereo to Multichannel Upmix , 2002 .

[67]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[68]  Özgür Yilmaz,et al.  On the approximate W-disjoint orthogonality of speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[69]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[70]  Guy J. Brown,et al.  A multi-pitch tracking algorithm for noisy speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[71]  P. Smaragdis,et al.  Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[72]  M. Davies,et al.  Complex domain onset detection for musical signals , 2003 .

[73]  Geoffroy Peeters Deriving Musical Structures from Signal Analysis for Music Audio Summary Generation: "Sequence" and "State" Approach , 2003, CMMR.

[74]  C. Avendano,et al.  Frequency-domain source identification and manipulation in stereo mixes for enhancement, suppression and re-panning applications , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[75]  Rémi Gribonval,et al.  Non negative sparse representation for Wiener based source separation with a single sensor , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[76]  Changshui Zhang,et al.  Clustering in Knowledge Embedded Space , 2003, ECML.

[77]  Shlomo Dubnov,et al.  Optimal Filtering of an Instrument Sound in a Mixed Recording Given Approximate Pitch Prior , 2004, ICMC.

[78]  Ye Wang,et al.  Automatic Detection Of Vocal Segments In Popular Songs , 2004, ISMIR.

[79]  DeLiang Wang,et al.  Monaural speech segregation based on pitch tracking and amplitude modulation , 2002, IEEE Transactions on Neural Networks.

[80]  Masataka Goto,et al.  A real-time music-scene-description system: predominant-F0 estimation for detecting melody and bass lines in real-world audio signals , 2004, Speech Commun..

[81]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[82]  Dan Barry,et al.  Sound Source Separation: Azimuth Discrimination and Resynthesis , 2004 .

[83]  Richard M. Stern,et al.  Reconstruction of missing features for robust speech recognition , 2004, Speech Commun..

[84]  Hsin-Min Wang,et al.  Blind Clustering of Popular Music Recordings Based on Singer Voice Characteristics , 2004, Computer Music Journal.

[85]  M. Tribus,et al.  Probability theory: the logic of science , 2003 .

[86]  Inderjit S. Dhillon,et al.  Generalized Nonnegative Matrix Approximations with Bregman Divergences , 2005, NIPS.

[87]  Changshui Zhang,et al.  Separation of Music Signals by Harmonic Structure Modeling , 2005, NIPS.

[88]  Shankar Vembu,et al.  Separation of Vocals from Polyphonic Audio Recordings , 2005, ISMIR.

[89]  Hiromasa Fujihara,et al.  Singer Identification Based on Accompaniment Sound Reduction and Reliable Frame Selection , 2005, ISMIR.

[90]  Powen Ru,et al.  Multiresolution spectrotemporal analysis of complex sounds. , 2005, The Journal of the Acoustical Society of America.

[91]  Changshui Zhang,et al.  Separation of Voice and Music by Harmonic Structure Stability Analysis , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[92]  DeLiang Wang,et al.  Detecting pitch of singing voice in polyphonic audio , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[93]  Eric Moulines,et al.  Inference in Hidden Markov Models (Springer Series in Statistics) , 2005 .

[94]  B. Raj,et al.  Latent variable decomposition of spectrograms for single channel speaker separation , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[95]  P. Philippe,et al.  One microphone singing voice separation using source-adapted models , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[96]  Rémi Gribonval,et al.  BSS_EVAL Toolbox User Guide -- Revision 2.0 , 2005 .

[97]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[98]  Emmanuel Vincent,et al.  Preliminary guidelines for subjective evalutation of audio source separation algorithms , 2006 .

[99]  Rémi Gribonval,et al.  Audio source separation with a single sensor , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[100]  Alex Loscos,et al.  Demixing Commercial Music Productions via Human-Assisted Time-Frequency Masking , 2006 .

[101]  DeLiang Wang,et al.  Singing Voice Separation from Monaural Recordings , 2006, ISMIR.

[102]  Anssi Klapuri,et al.  Transcription of the Singing Melody in Polyphonic Music , 2006, ISMIR.

[103]  A.J. Viterbi A personal history of the Viterbi algorithm , 2006, IEEE Signal Processing Magazine.

[104]  Anssi Klapuri,et al.  Multiple Fundamental Frequency Estimation by Summing Harmonic Amplitudes , 2006, ISMIR.

[105]  Karin Dressler,et al.  SINUSOIDAL EXTRACTION USING AN EFFICIENT IMPLEMENTATION OF A MULTI-RESOLUTION FFT , 2006 .

[106]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[107]  Hirokazu Kameoka,et al.  Selective Amplifier of Periodic and Non-periodic Components in Concurrent Audio Signals with Spectral Control Envelopes , 2006 .

[108]  DeLiang Wang,et al.  Separation of Singing Voice From Music Accompaniment for Monaural Recordings , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[109]  Anssi Klapuri,et al.  Singer Identification in Polyphonic Music Using Vocal Separation and Pattern Recognition Methods , 2007, ISMIR.

[110]  H. Robbins A Stochastic Approximation Method , 1951 .

[111]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[112]  Te-Won Lee,et al.  Blind Speech Separation , 2007, Blind Speech Separation.

[113]  Mathieu Lagrange,et al.  Sound Source Tracking and Formation using Normalized Cuts , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[114]  Christopher Raphael,et al.  Desoloing Monaural Audio Using Mixture Models , 2007, ISMIR.

[115]  Bhiksha Raj,et al.  Supervised and Semi-supervised Separation of Sounds from Single-Channel Mixtures , 2007, ICA.

[116]  P. Smaragdis,et al.  Shift-Invariant Probabilistic Latent Component Analysis , 2007 .

[117]  Bryan Pardo,et al.  Modeling Perceptual Similarity of Audio Signals for Blind Source Separation Evaluation , 2007, ICA.

[118]  Rémi Gribonval,et al.  Adaptation of Bayesian Models for Single-Channel Source Separation and its Application to Voice/Music Separation in Popular Songs , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[119]  Bryan Pardo,et al.  Towards a Model of Perceived Quality of Blind Audio Source Separation , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[120]  Tuomas Virtanen,et al.  Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[121]  Gaël Richard,et al.  Singer melody extraction in polyphonic signals using source separation methods , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[122]  Anssi Klapuri,et al.  Automatic Transcription of Melody, Bass Line, and Chords in Polyphonic Music , 2008, Computer Music Journal.

[123]  A. Kondoz,et al.  Comparison of subjective and objective evaluation methods for audio source separation , 2008 .

[124]  Anssi Klapuri,et al.  Accompaniment separation and karaoke application based on automatic melody transcription , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[125]  Tuomas Virtanen,et al.  Combining pitch-based inference and non-negative spectrogram factorization in separating vocals from polyphonic music , 2008, SAPA@INTERSPEECH.

[126]  Maximo Cobos,et al.  Singing Voice Separation Combining Panning Information and Pitch Tracking , 2008 .

[127]  A. Chanrungutai,et al.  Singing Voice Separation in Mono-Channel Music , 2008, 2008 International Symposium on Communications and Information Technologies.

[128]  Hirokazu Kameoka,et al.  A Real-time Equalizer of Harmonic and Percussive Components in Music Signals , 2008, ISMIR.

[129]  Maximo Cobos,et al.  Stereo audio source separation based on time-frequency masking and multilevel thresholding , 2008, Digit. Signal Process..

[130]  Hirokazu Kameoka,et al.  Separation of a monaural audio signal into harmonic/percussive components by complementary diffusion on spectrogram , 2008, 2008 16th European Signal Processing Conference.

[131]  Christopher Raphael,et al.  A Classifier-Based Approach to Score-Guided Source Separation of Musical Audio , 2008, Computer Music Journal.

[132]  Mathieu Lagrange,et al.  Normalized Cuts for Predominant Melodic Source Separation , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[133]  Changshui Zhang,et al.  Unsupervised Single-Channel Music Source Separation by Average Harmonic Structure Modeling , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[134]  A. Chanrungutai,et al.  Singing voice separation for mono-channel music using Non-negative Matrix Factorization , 2008, 2008 International Conference on Advanced Technologies for Communications.

[135]  Eric Plourde,et al.  Auditory-Based Spectral Amplitude Estimators for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[136]  Masataka Goto,et al.  Music Structure Analysis from Acoustic Signals , 2008 .

[137]  Jyh-Shing Roger Jang,et al.  Separation of Singing Voice from Music Accompaniment with Unvoiced Sounds Reconstruction for Monaural Recordings , 2008 .

[138]  Alexey Ozerov,et al.  Multichannel nonnegative matrix factorization in convolutive mixtures. With application to blind audio source separation , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[139]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[140]  Paris Smaragdis,et al.  Separation by “humming”: User-guided sound extraction from monophonic mixtures , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[141]  Volker Gnann SOURCE-FILTER BASED CLUSTERING FOR MONAURAL BLIND SOURCE SEPARATION , 2009 .

[142]  Ali Taylan Cemgil,et al.  Bayesian Inference for Nonnegative Matrix Factorisation Models , 2009, Comput. Intell. Neurosci..

[143]  Gaël Richard,et al.  Main instrument separation from stereophonic audio signals using a source/filter model , 2009, 2009 17th European Signal Processing Conference.

[144]  Alan C. Bovik,et al.  Mean squared error: Love it or leave it? A new look at Signal Fidelity Measures , 2009, IEEE Signal Processing Magazine.

[145]  DeLiang Wang,et al.  Monaural Musical Sound Separation Based on Pitch and Common Amplitude Modulation , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[146]  Ole Winther,et al.  Bayesian Non-negative Matrix Factorization , 2009, ICA.

[147]  Gaël Richard,et al.  An iterative approach to monaural musical mixture de-soloing , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[148]  Emmanuel Vincent,et al.  The 2008 Signal Separation Evaluation Campaign: A Community-Based Approach to Large-Scale Evaluation , 2009, ICA.

[149]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[150]  Jyh-Shing Roger Jang,et al.  Singing Pitch Extraction from Monaural Polyphonic Songs by Contextual Audio Modeling and Singing Harmonic Enhancement , 2009, ISMIR.

[151]  D. Fitzgerald,et al.  Using Tensor Factorisation Models to Separate Drums from Polyphonic Music , 2009 .

[152]  Corey Cheng,et al.  MELODY LINE DETECTION AND SOURCE SEPARATION IN CLASSICAL SAXOPHONE RECORDINGS , 2009 .

[153]  Minje Kim,et al.  Nonnegative matrix partial co-factorization for drum source separation , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[154]  Derry Fitzgerald,et al.  Harmonic/Percussive Separation Using Median Filtering , 2010 .

[155]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[156]  Alexey Ozerov,et al.  Notes on Nonnegative Tensor Factorization of the Spectrogram for Audio Source Separation: Statistical Insights and Towards Self-Clustering of the Spatial Cues , 2010, CMMR.

[157]  Meinard Müller,et al.  Audio-based Music Structure Analysis , 2010 .

[158]  Bhiksha Raj,et al.  Latent-variable decomposition based dereverberation of monaural and multi-channel signals , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[159]  Changshui Zhang,et al.  Multiple Fundamental Frequency Estimation by Modeling Spectral Peaks and Non-Peak Regions , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[160]  Richard Polfreman,et al.  SINGING VOICE SEPARATION BASED ON NON-VOCAL INDEPENDENT COMPONENT SUBTRACTION AND AMPLITUDE DISCRIMINATION , 2010 .

[161]  Derry Fitzgerald,et al.  Single Channel Vocal Separation using Median Filtering and Factorisation Techniques , 2010 .

[162]  Shigeki Sagayama,et al.  Melody line estimation in homophonic music audio signals based on temporal-variability of melodic source , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[163]  Fabian J. Theis,et al.  The 2010 Signal Separation Evaluation Campaign (SiSEC2010): Audio Source Separation , 2010, LVA/ICA.

[164]  Rémi Gribonval,et al.  Under-Determined Reverberant Audio Source Separation Using a Full-Rank Spatial Covariance Model , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[165]  Emmanuel Vincent,et al.  Multi-criteria subjective and objective evaluation of audio source separation , 2010 .

[166]  Hirokazu Kameoka,et al.  Harmonic and Percussive Sound Separation and Its Application to MIR-Related Tasks , 2010, Advances in Music Information Retrieval.

[167]  Emmanuel Vincent,et al.  A General Modular Framework for Audio Source Separation , 2010, LVA/ICA.

[168]  Alexey Ozerov,et al.  Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[169]  Pierre Comon,et al.  Handbook of Blind Source Separation: Independent Component Analysis and Applications , 2010 .

[170]  Hiromasa Fujihara,et al.  A Modeling of Singing Voice Robust to Accompaniment Sounds and Its Application to Singer Identification and Vocal-Timbre-Similarity-Based Music Information Retrieval , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[171]  DeLiang Wang,et al.  A Tandem Algorithm for Pitch Estimation and Voiced Speech Segregation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[172]  Richard Polfreman,et al.  Towards effective singing voice extraction from stereophonic recordings , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[173]  Mark D. Plumbley,et al.  Probabilistic Modeling Paradigms for Audio Source Separation , 2010 .

[174]  Gaël Richard,et al.  Source/Filter Model for Unsupervised Main Melody Extraction From Polyphonic Audio Signals , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[175]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[176]  Ben Taskar,et al.  Posterior Regularization for Structured Latent Variable Models , 2010, J. Mach. Learn. Res..

[177]  Preeti Rao,et al.  Vocal Melody Extraction in the Presence of Pitched Accompaniment in Polyphonic Music , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[178]  Antoine Liutkus,et al.  Informed Source Separation Using Latent Components , 2010, LVA/ICA.

[179]  Yann LeCun,et al.  Learning Fast Approximations of Sparse Coding , 2010, ICML.

[180]  Jyh-Shing Roger Jang,et al.  On the Improvement of Singing Voice Separation for Monaural Recordings Using the MIR-1K Dataset , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[181]  Bhiksha Raj,et al.  Non-negative Hidden Markov Modeling of Audio with Application to Source Separation , 2010, LVA/ICA.

[182]  Alexey Ozerov,et al.  Multichannel nonnegative tensor factorization with structured constraints for user-guided audio source separation , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[183]  Emmanuel Vincent,et al.  Subjective and Objective Quality Assessment of Audio Source Separation , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[184]  Ching-Wei Chen,et al.  Improving melody extraction using Probabilistic Latent Component Analysis , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[185]  Minje Kim,et al.  Gaussian Mixture Model for Singing Voice Separation from Stereophonic Music , 2011 .

[186]  Axel Röbel,et al.  Pitch transposition and breathiness modification using a glottal source model and its adapted vocal-tract filter , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[187]  Bryan Pardo,et al.  A simple music/voice separation method based on the extraction of the repeating musical structure , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[188]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[189]  Antoine Liutkus,et al.  Informed source separation: Source coding meets source separation , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[190]  Gerald Schuller,et al.  Influence of Phase, Magnitude and Location ofHarmonic Components in the Perceived Quality of Extracted Solo Signals , 2011, Semantic Audio.

[191]  C. Joder,et al.  A Conditional Random Field Framework for Robust and Scalable Audio-to-Score Matching , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[192]  Christian Dittmar,et al.  Songs2See: Learn to Play by Playing , 2011 .

[193]  Preeti Rao,et al.  Context-Aware Features for Singing Voice Detection in Polyphonic Music , 2011, Adaptive Multimedia Retrieval.

[194]  Zhijian Ou,et al.  Combining HMM-based melody extraction and NMF-based soft masking for separating voice and accompaniment from monaural audio , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[195]  Gaël Richard,et al.  A Musically Motivated Mid-Level Representation for Pitch Estimation and Musical Audio Source Separation , 2011, IEEE Journal of Selected Topics in Signal Processing.

[196]  Hirokazu Kameoka,et al.  New formulations and efficient algorithms for multichannel NMF , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[197]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[198]  Israel Cohen,et al.  Transient Noise Reduction Using Nonlocal Diffusion Filters , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[199]  Udo Zoelzer,et al.  DAFX: Digital Audio Effects , 2011 .

[200]  Minje Kim,et al.  Nonnegative Matrix Partial Co-Factorization for Spectral and Temporal Drum Source Separation , 2011, IEEE Journal of Selected Topics in Signal Processing.

[201]  Antoine Liutkus,et al.  Gaussian Processes for Underdetermined Source Separation , 2011, IEEE Transactions on Signal Processing.

[202]  Xiaofei He,et al.  Robust non-negative matrix factorization , 2011 .

[203]  Karin Dressler,et al.  Pitch Estimation by the Pair-Wise Evaluation of Spectral Peaks , 2011, Semantic Audio.

[204]  Jordi Bonada,et al.  Predominant Fundamental Frequency Estimation vs Singing Voice Separation for the Automatic Transcription of Accompanied Flamenco Singing , 2012, ISMIR.

[205]  Derry Fitzgerald Vocal separation using nearest neighbours and median filtering , 2012 .

[206]  Gerald Schuller,et al.  Efficient implementation of a system for solo and accompaniment separation in polyphonic music , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[207]  DeLiang Wang,et al.  A Tandem Algorithm for Singing Pitch Extraction and Voice Separation From Music Accompaniment , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[208]  Michael Elad,et al.  Audio Inpainting , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[209]  Jakob Abeßer,et al.  Music Information Retrieval Meets Music Education , 2012, Multimodal Music Processing.

[210]  Nicolas Sturmel,et al.  Linear Mixing Models for Active Listening of Music Productions in Realistic Studio Conditions , 2012 .

[211]  Roland Badeau,et al.  Blind Harmonic Adaptive Decomposition applied to supervised source separation , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[212]  Bryan Pardo,et al.  Music/Voice Separation Using the Similarity Matrix , 2012, ISMIR.

[213]  Jean-Philippe Thiran,et al.  Musical Audio Source Separation Based on User-Selected F0 Track , 2012, LVA/ICA.

[214]  Fabian J. Theis,et al.  The signal separation evaluation campaign (2007-2010): Achievements and remaining challenges , 2012, Signal Process..

[215]  Jordi Janer,et al.  Combining a harmonic-based NMF decomposition with transient analysis for instantaneous percussion separation , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[216]  Jordi Janer,et al.  Score-informed and timbre independent lead instrument separation in real-world scenarios , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[217]  Emmanuel Vincent,et al.  A General Flexible Framework for the Handling of Prior Information in Audio Source Separation , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[218]  Antoine Liutkus,et al.  Adaptive filtering for music/voice separation exploiting the repeating musical structure , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[219]  Jordi Bonada,et al.  Low-Latency Instrument Separation in Polyphonic Audio Using Timbre Models , 2012, LVA/ICA.

[220]  Hirokazu Kameoka,et al.  Efficient algorithms for multichannel extensions of Itakura-Saito nonnegative matrix factorization , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[221]  Andreas Ziehe,et al.  The 2011 Signal Separation Evaluation Campaign (SiSEC2011): - Audio Source Separation - , 2012, LVA/ICA.

[222]  Richard Polfreman,et al.  H-Semantics: A Hybrid Approach to Singing Voice Separation , 2012 .

[223]  Jordi Janer,et al.  A Tikhonov regularization method for spectrum decomposition in low latency audio source separation , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[224]  Gaël Richard,et al.  Audio source separation informed by redundancy with greedy multiscale decompositions , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[225]  Guillermo Sapiro,et al.  Real-time Online Singing Voice Separation from Monaural Recordings Using Robust Low-rank Modeling , 2012, ISMIR.

[226]  Yi-Hsuan Yang,et al.  On sparse and low-rank matrix decomposition for singing voice separation , 2012, ACM Multimedia.

[227]  Björn W. Schuller,et al.  Score-Informed Leading Voice Separation from Monaural Audio , 2012, ISMIR.

[228]  Emmanuel Vincent,et al.  Improved Perceptual Metrics for the Evaluation of Audio Source Separation , 2012, LVA/ICA.

[229]  Paris Smaragdis,et al.  Singing-voice separation from monaural recordings using robust principal component analysis , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[230]  Hirokazu Kameoka,et al.  Comparative evaluations of various harmonic/percussive sound separation algorithms based on anisotropic continuity of spectrogram , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[231]  Ruijiang Li,et al.  Multi-Stage Non-Negative Matrix Factorization for Monaural Singing Voice Separation , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[232]  Derry Fitzgerald,et al.  Improved stereo instrumental track recovery using median nearest-neighbour inpainting , 2013 .

[233]  Bryan Pardo,et al.  REpeating Pattern Extraction Technique (REPET): A Simple Method for Music/Voice Separation , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[234]  Gautham J. Mysore,et al.  Combining Modeling Of Singing Voice And Background Music For Automatic Separation Of Musical Mixtures , 2013, ISMIR.

[235]  Gautham J. Mysore,et al.  An Efficient Posterior Regularized Latent Variable Model for Interactive Sound Source Separation , 2013, ICML.

[236]  Gautham J. Mysore,et al.  Interactive refinement of supervised and semi-supervised sound source separation estimates , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[237]  Ngoc Q. K. Duong,et al.  Weighted nonnegative tensor factorization: on monotonicity of multiplicative update rules and application to user-guided audio source separation , 2013 .

[238]  Benjamin Schrauwen,et al.  Training and analyzing deep recurrent neural networks , 2013, NIPS 2013.

[239]  Jordi Janer,et al.  MODELLING AND SEPARATION OF SINGING VOICE BREATHINESS IN POLYPHONIC MIXTURES , 2013 .

[240]  Hirokazu Kameoka,et al.  Multichannel Extensions of Non-Negative Matrix Factorization With Complex-Valued Data , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[241]  Jordi Janer,et al.  Separation of Unvoiced Fricatives in Singing Voice Mixtures with Semi-Supervised NMF , 2013 .

[242]  Yi-Hsuan Yang,et al.  Low-Rank Representation of Both Singing Voice and Music Accompaniment Via Learned Dictionaries , 2013, ISMIR.

[243]  Benjamin Schrauwen,et al.  Training and Analysing Deep Recurrent Neural Networks , 2013, NIPS.

[244]  Mark W. Schmidt,et al.  Block-Coordinate Frank-Wolfe Optimization for Structural SVMs , 2012, ICML.

[245]  Antoine Liutkus,et al.  An overview of informed audio source separation , 2013, 2013 14th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS).

[246]  Yong Yu,et al.  Robust Recovery of Subspace Structures by Low-Rank Representation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[247]  Christopher Ré,et al.  Parallel stochastic gradient algorithms for large-scale matrix completion , 2013, Mathematical Programming Computation.

[248]  Emmanuel Vincent,et al.  Introducing a simple fusion framework for audio source separation , 2013, 2013 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).

[249]  Nicholas J. Bryan Interactive User-Feedback for Sound Source Separation , 2013 .

[250]  Gerald Schuller,et al.  RE-THINKING SOUND SEPARATION: PRIOR INFORMATION AND ADDITIVITY CONSTRAINT IN SEPARATION ALGORITHMS , 2013 .

[251]  Bryan Pardo,et al.  Online REPET-SIM for real-time speech enhancement , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[252]  Feiping Nie,et al.  Joint Schatten $$p$$p-norm and $$\ell _p$$ℓp-norm robust matrix completion for missing value recovery , 2013, Knowledge and Information Systems.

[253]  Mohamed-Jalal Fadili,et al.  A Generalized Forward-Backward Splitting , 2011, SIAM J. Imaging Sci..

[254]  Frederick Z. Yen,et al.  Singing Voice Separation Using Spectro-Temporal Modulation Features , 2014, ISMIR.

[255]  Joakim Andén,et al.  Deep Scattering Spectrum , 2013, IEEE Transactions on Signal Processing.

[256]  Preeti Rao,et al.  Vocal Separation using Singer-Vowel Priors Obtained from Polyphonic Audio , 2014, ISMIR.

[257]  Gerald Schuller,et al.  Pitch-informed solo and accompaniment separation towards its use in music education applications , 2014, EURASIP J. Adv. Signal Process..

[258]  Xabier Jaureguiberry,et al.  The Flexible Audio Source Separation Toolbox Version 2.0 , 2014, ICASSP 2014.

[259]  Kyogu Lee,et al.  Vocal Separation from Monaural Music Using Temporal/Spectral Continuity and Sparsity Constraints , 2014, IEEE Signal Processing Letters.

[260]  Razvan Pascanu,et al.  How to Construct Deep Recurrent Neural Networks , 2013, ICLR.

[261]  Antoine Liutkus,et al.  REPET for Background/Foreground Separation in Audio , 2014 .

[262]  Daniel P. W. Ellis,et al.  Melody Extraction from Polyphonic Music Signals: Approaches, applications, and challenges , 2014, IEEE Signal Processing Magazine.

[263]  Kyogu Lee,et al.  Vocal separation using extended robust principal component analysis with Schatten p/lp-norm and scale compression , 2014, 2014 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).

[264]  Sascha Disch,et al.  Extending Harmonic-Percussive Separation of Audio Signals , 2014, ISMIR.

[265]  Bryan Pardo,et al.  Combining Rhythm-Based and Pitch-Based Methods for Background and Melody Separation , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[266]  Shigeki Sagayama,et al.  Singing Voice Enhancement in Monaural Music Signals Based on Two-stage Harmonic/Percussive Sound Separation on Multiple Resolution Spectrograms , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[267]  Paris Smaragdis,et al.  Singing-Voice Separation from Monaural Recordings using Deep Recurrent Neural Networks , 2014, ISMIR.

[268]  Antoine Liutkus,et al.  Kernel Additive Models for Source Separation , 2014, IEEE Transactions on Signal Processing.

[269]  Gautham J. Mysore,et al.  Exploiting long-term temporal dependencies in NMF using recurrent neural networks with application to source separation , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[270]  Jen-Tzung Chien,et al.  Bayesian Singing-Voice Separation , 2014, ISMIR.

[271]  Rémi Gribonval,et al.  From Blind to Guided Audio Source Separation: How models and side information can improve the separation of sound , 2014, IEEE Signal Processing Magazine.

[272]  Siu Wa Lee,et al.  Soft constrained leading voice separation with music score guidance , 2014, The 9th International Symposium on Chinese Spoken Language Processing.

[273]  Mathias Johansson Blind Source Separation , 2014, Encyclopedia of Social Network Analysis and Mining.

[274]  Ngoc Q. K. Duong,et al.  On monotonicity of multiplicative update rules for weighted nonnegative tensor factorization , 2014 .

[275]  Matthias Mauch,et al.  MedleyDB: A Multitrack Dataset for Annotation-Intensive MIR Research , 2014, ISMIR.

[276]  Andrzej Cichocki,et al.  Nonnegative Matrix and Tensor Factorizations : An algorithmic perspective , 2014, IEEE Signal Processing Magazine.

[277]  Paris Smaragdis,et al.  Deep learning for monaural speech separation , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[278]  Emmanuel Vincent,et al.  Variational Bayesian model averaging for audio source separation , 2014, 2014 IEEE Workshop on Statistical Signal Processing (SSP).

[279]  Antoine Liutkus,et al.  Kernel spectrogram models for source separation , 2014, 2014 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA).

[280]  Daniel P. W. Ellis,et al.  Music-Content-Adaptive Robust Principal Component Analysis for a Semantically Consistent Separation of Foreground and Background in Music Audio Signals , 2014, DAFx.

[281]  Dong Yu,et al.  Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..

[282]  Stéphane Mallat,et al.  Audio source separation with time-frequency velocities , 2014, 2014 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).

[283]  Lu Gan,et al.  Separation of vocals from monaural music recordings using diagonal median filters and practical time-frequency parameters , 2015, 2015 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT).

[284]  Antoine Liutkus,et al.  Robust ASR using neural network based speech enhancement and feature simulation , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[285]  Jun-Yong Lee,et al.  Vocal separation from monaural music using adaptive auditory filtering based on kernel back-fitting , 2015, INTERSPEECH.

[286]  Jin Young Kim,et al.  Music/Voice Separation Based on Kernel Back-fitting Using Weighted β-order MMSE Estimation , 2015 .

[287]  Katsutoshi Itoyama,et al.  A music performance assistance system based on vocal, harmonic, and percussive source separation and content visualization for music audio signals , 2015 .

[288]  Meinard Müller,et al.  Extracting singing voice from music recordings by cascading audio decomposition techniques , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[289]  Franck Giron,et al.  Deep neural network based instrument extraction from music , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[290]  Wei Li,et al.  Latent time-frequency component analysis: A novel pitch-based approach for singing voice separation , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[291]  Frederick Z. Yen,et al.  A two-stage singing voice separation algorithm using spectro-temporal modulation features , 2015, INTERSPEECH.

[292]  Linwei Li,et al.  Towards Solving the Bottleneck of Pitch-based Singing Voice Separation , 2015, ACM Multimedia.

[293]  Paris Smaragdis,et al.  Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[294]  Katsutoshi Itoyama,et al.  Singing voice analysis and editing based on mutually dependent F0 estimation and source separation , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[295]  Jun-Yong Lee,et al.  Singing Voice Separation from Monaural Music Based on Kernel Back-Fitting Using Beta-Order Spectral Amplitude Estimation , 2015, ISMIR.

[296]  Guizhong Liu,et al.  Separation of Singing Voice Using Nonnegative Matrix Partial Co-Factorization for Singer Identification , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[297]  Hirokazu Kameoka,et al.  Lp-norm non-negative matrix factorization and its application to singing voice enhancement , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[298]  Mark D. Plumbley,et al.  Deep Karaoke: Extracting Vocals from Musical Mixtures Using a Convolutional Deep Neural Network , 2015, LVA/ICA.

[299]  Antoine Liutkus,et al.  Kernel Additive Modeling for interference reduction in multi-channel music recordings , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[300]  Antoine Liutkus,et al.  Generalized Wiener filtering with fractional power spectrograms , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[301]  Antoine Liutkus,et al.  Scalable audio separation with light Kernel Additive Modelling , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[302]  Masaaki Ikehara,et al.  Vocal separation by constrained non-negative matrix factorization , 2015, 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).

[303]  Yi-Hsuan Yang,et al.  Vocal activity informed singing voice separation with the iKala dataset , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[304]  Elliot Moore,et al.  On the perceptual relevance of objective source separation measures for singing voice separation , 2015, 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[305]  Hyoung‐Gook Kim,et al.  Music and Voice Separation Using Log-Spectral Amplitude Estimator Based on Kernel Spectrogram Models Backfitting , 2015 .

[306]  Meinard Mller,et al.  Fundamentals of Music Processing: Audio, Analysis, Algorithms, Applications , 2015 .

[307]  Lu Gan,et al.  A local discontinuity based approach for monaural singing voice separation from accompanying music with multi-stage non-negative matrix factorization , 2015, 2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[308]  Jianping Li,et al.  Joint optimization of recurrent networks exploiting source auto-regression for source separation , 2015, INTERSPEECH.

[309]  Roland Badeau,et al.  Singing voice detection with deep recurrent neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[310]  Paris Smaragdis,et al.  Adaptive Denoising Autoencoders: A Fine-Tuning Scheme to Learn from Test Mixtures , 2015, LVA/ICA.

[311]  Jon Barker,et al.  The third ‘CHiME’ speech separation and recognition challenge: Dataset, task and baselines , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[312]  Antoine Liutkus,et al.  A simple user interface system for recovering patterns repeating in time and frequency in mixtures of sounds , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[313]  François Rigaud,et al.  An Automated Source Separation Technology and Its Practical Applications , 2016 .

[314]  Romain Hennequin,et al.  Long-Term Reverberation Modeling for Under-Determined Audio Source Separation with Application to Vocal Melody Extraction , 2016, ISMIR.

[315]  Jen-Tzung Chien,et al.  Bayesian Factorization and Learning for Monaural Source Separation , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[316]  Zhuo Chen,et al.  Deep clustering: Discriminative embeddings for segmentation and separation , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[317]  Shiqian Ma,et al.  Alternating Proximal Gradient Method for Convex Minimization , 2015, Journal of Scientific Computing.

[318]  Mark D. Plumbley,et al.  UNTWIST: A NEW TOOLBOX FOR AUDIO SOURCE SEPARATION , 2016 .

[319]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[320]  Hao Huang,et al.  Monaural Singing Voice Separation by Non-negative Matrix Partial Co-Factorization with Temporal Continuity and Sparsity Criteria , 2016, ICIC.

[321]  Antoine Liutkus,et al.  Common fate model for unison source separation , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[322]  Mark D. Plumbley,et al.  Single Channel Audio Source Separation using Deep Neural Network Ensembles , 2016 .

[323]  Emmanuel Vincent,et al.  Multichannel Audio Source Separation With Deep Neural Networks , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[324]  Masaaki Ikehara,et al.  Vocal separation using improved robust principal component analysis and post-processing , 2016, 2016 IEEE 59th International Midwest Symposium on Circuits and Systems (MWSCAS).

[325]  Mark D. Plumbley,et al.  Combining Mask Estimates for Single Channel Audio Source Separation Using Deep Neural Networks , 2016, INTERSPEECH.

[326]  Hirokazu Kameoka,et al.  Non-negative periodic component analysis for music source separation , 2016, 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).

[327]  Adrian E. Raftery,et al.  Bayesian Model Averaging: A Tutorial , 2016 .

[328]  Shigeki Sagayama,et al.  A Real-time Audio-to-audio Karaoke Generation System for Monaural Recordings Based on Singing Voice Suppression and Key Conversion Techniques , 2016, J. Inf. Process..

[329]  Stéphane Mallat,et al.  Rigid Motion Model for Audio Source Separation , 2016, IEEE Transactions on Signal Processing.

[330]  Jonathan Le Roux,et al.  Single-Channel Multi-Speaker Separation Using Deep Clustering , 2016, INTERSPEECH.

[331]  Yi-Hsuan Yang,et al.  Complex and Quaternionic Principal Component Pursuit and Its Application to Audio Separation , 2016, IEEE Signal Processing Letters.

[332]  Emmanuel Vincent,et al.  Fusion Methods for Speech Enhancement and Audio Source Separation , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[333]  Jan Schlüter,et al.  Learning to Pinpoint Singing Voice from Weakly Labeled Examples , 2016, ISMIR.

[334]  Hema A. Murthy,et al.  Group delay based music source separation using deep recurrent neural networks , 2016, 2016 International Conference on Signal Processing and Communications (SPCOM).

[335]  Gerald Schuller,et al.  New Sonorities for Jazz Recordings: Separation and Mixing using Deep Neural Networks , 2016 .

[336]  Katsutoshi Itoyama,et al.  Singing Voice Separation and Vocal F0 Estimation Based on Mutual Combination of Robust Principal Component Analysis and Subharmonic Summation , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[337]  Emmanuel Vincent,et al.  Multichannel music separation with deep neural networks , 2016, 2016 24th European Signal Processing Conference (EUSIPCO).

[338]  Tijl De Bie,et al.  Learning to separate vocals from polyphonic mixtures via ensemble methods and structured output prediction , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[339]  Gautham J. Mysore,et al.  Fast and easy crowdsourced perceptual audio evaluation , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[340]  Jyh-Shing Roger Jang,et al.  Singing Voice Separation and Pitch Extraction from Monaural Polyphonic Audio Music via DNN and Adaptive Pitch Tracking , 2016, 2016 IEEE Second International Conference on Multimedia Big Data (BigMM).

[341]  Mark D. Plumbley,et al.  Discriminative Enhancement for Single Channel Audio Source Separation Using Deep Neural Networks , 2016, LVA/ICA.

[342]  Kyogu Lee,et al.  Singing Voice Separation Using RPCA with Weighted l_1 -norm , 2017, LVA/ICA.

[343]  Jonathan Le Roux,et al.  Deep clustering and conventional networks for music separation: Stronger together , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[344]  Mark D. Plumbley,et al.  Two-Stage Single-Channel Audio Source Separation Using Deep Neural Networks , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[345]  Gerald Schuller,et al.  A recurrent encoder-decoder approach with skip-filtering connections for monaural singing voice separation , 2017, 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP).

[346]  Mark Hasegawa-Johnson,et al.  Speech Enhancement Using Bayesian Wavenet , 2017, INTERSPEECH.

[347]  Franck Giron,et al.  Improving music source separation based on deep neural networks through data augmentation and network blending , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[348]  M. Picheny,et al.  Comparison of Parametric Representation for Monosyllabic Word Recognition in Continuously Spoken Sentences , 2017 .

[349]  Tillman Weyde,et al.  Singing Voice Separation with Deep U-Net Convolutional Networks , 2017, ISMIR.

[350]  Antoine Liutkus,et al.  The 2016 Signal Separation Evaluation Campaign , 2017, LVA/ICA.

[351]  Bryan Pardo,et al.  Music/Voice separation using the 2D fourier transform , 2017, 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[352]  Mark B. Sandler,et al.  Interference reduction in music recordings combining Kernel Additive Modelling and Non-Negative Matrix Factorization , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[353]  Bryan Pardo,et al.  Predicting algorithm efficacy for adaptive multi-cue source separation , 2017, 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[354]  Yi-Hsuan Yang,et al.  Informed Group-Sparse Representation for Singing Voice Separation , 2017, IEEE Signal Processing Letters.

[355]  Naoya Takahashi,et al.  Multi-Scale multi-band densenets for audio source separation , 2017, 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[356]  Emilia Gómez,et al.  Monoaural Audio Source Separation Using Deep Convolutional Neural Networks , 2017, LVA/ICA.

[357]  Emmanuel Vincent,et al.  Audio Source Separation and Speech Enhancement , 2018 .

[358]  Antoine Liutkus,et al.  The 2018 Signal Separation Evaluation Campaign , 2018, LVA/ICA.

[359]  Yoshua Bengio,et al.  Monaural Singing Voice Separation with Skip-Filtering Connections and Recurrent Inference of Time-Frequency Mask , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[360]  Dezhong Peng,et al.  Blind Source Separation , 2019, EEG Signal Processing and Feature Extraction.