Applying Source Separation to Music

Separation of existing audio into remixable elements is very useful to repurpose music audio. Applications include upmixing video soundtracks to surround sound (e.g. home theater 5.1 systems), facilitating music transcriptions, allowing better mashups and remixes for disk jockeys, and rebalancing sound levels on multiple instruments or voices recorded simultaneously to a single track. In this chapter, we provide an overview of the algorithms and approaches designed to address the challenges and opportunities in music. Where applicable, we also introduce commonalities and links to source separation for video soundtracks, since many musical scenarios involve video soundtracks (e.g. YouTube recordings of live concerts, movie sound tracks). While space prohibits describing every method in detail, we include detail on representative music‐specific algorithms and approaches not covered in other chapters. The intent is to give the reader a high‐level understanding of the workings of key exemplars of the source separation approaches applied in this domain.

[1]  Gautham J. Mysore,et al.  ISSE: an interactive source separation editor , 2014, CHI.

[2]  Changshui Zhang,et al.  Multiple Fundamental Frequency Estimation by Modeling Spectral Peaks and Non-Peak Regions , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  M. Davy,et al.  Bayesian analysis of polyphonic western tonal music. , 2006, The Journal of the Acoustical Society of America.

[4]  Paris Smaragdis,et al.  Collaborative audio enhancement using probabilistic latent component sharing , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Mert Bay,et al.  Second Fiddle is Important Too: Pitch Tracking Individual Voices in Polyphonic Music , 2012, ISMIR.

[6]  Hirokazu Kameoka,et al.  Bayesian nonparametric spectrogram modeling based on infinite factorial infinite hidden Markov model , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[7]  Anssi Klapuri,et al.  Musical Instrument Recognition in Polyphonic Audio Using Source-Filter Model for Sound Separation , 2009, ISMIR.

[8]  Gaël Richard,et al.  Multipitch estimation using a PLCA-based model: Impact of partial user annotation , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  DeLiang Wang,et al.  Musical Sound Separation Using Pitch-Based Labeling and Binary Time-Frequency Masking , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Antoine Liutkus,et al.  Scalable audio separation with light Kernel Additive Modelling , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Lale Akarun,et al.  Large scale polyphonic music transcription using randomized matrix decompositions , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[12]  Bryan Pardo,et al.  Aligning Semi-Improvised Music Audio with Its Lead Sheet , 2011, ISMIR.

[13]  Yu-Jin Zhang,et al.  Nonnegative Matrix Factorization: A Comprehensive Review , 2013, IEEE Transactions on Knowledge and Data Engineering.

[14]  Bryan Pardo,et al.  Combining Rhythm-Based and Pitch-Based Methods for Background and Melody Separation , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[15]  Emmanuel Vincent,et al.  Improved Perceptual Metrics for the Evaluation of Audio Source Separation , 2012, LVA/ICA.

[16]  Elias Kokkinis,et al.  A New DSP Tool for Drum Leakage Suppression , 2013 .

[17]  Gaël Richard,et al.  A Musically Motivated Mid-Level Representation for Pitch Estimation and Musical Audio Source Separation , 2011, IEEE Journal of Selected Topics in Signal Processing.

[18]  DeLiang Wang,et al.  An Unsupervised Approach to Cochannel Speech Separation , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Mark D. Plumbley,et al.  Score-Informed Source Separation for Musical Audio Recordings: An overview , 2014, IEEE Signal Processing Magazine.

[20]  Bryan Pardo,et al.  REpeating Pattern Extraction Technique (REPET): A Simple Method for Music/Voice Separation , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Bryan Pardo,et al.  A novel cepstral representation for timbre modeling of sound sources in polyphonic mixtures , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  P. Smaragdis,et al.  Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[23]  Derry Fitzgerald Vocal separation using nearest neighbours and median filtering , 2012 .

[24]  Meinard Müller,et al.  Using score-informed constraints for NMF-based source separation , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Antoine Liutkus,et al.  Adaptive filtering for music/voice separation exploiting the repeating musical structure , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[26]  Joshua D. Reiss,et al.  A Wiener Filter Approach to Microphone Leakage Reduction in Close-Microphone Applications , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[27]  Shigeki Sagayama,et al.  Singing Voice Enhancement in Monaural Music Signals Based on Two-stage Harmonic/Percussive Sound Separation on Multiple Resolution Spectrograms , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[28]  A. Klapuri,et al.  Analysis of polyphonic audio using source-filter model and non-negative matrix factorization , 2006 .

[29]  Frédéric Bimbot,et al.  Music separation guided by cover tracks: Designing the joint NMF model , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[30]  Roland Badeau,et al.  Time-dependent parametric and harmonic templates in non-negative matrix factorization , 2010 .

[31]  Gaël Richard,et al.  Source/Filter Model for Unsupervised Main Melody Extraction From Polyphonic Audio Signals , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[32]  Derry Fitzgerald,et al.  Harmonic/Percussive Separation Using Median Filtering , 2010 .

[33]  Nicolás Ruiz-Reyes,et al.  Percussive/harmonic sound separation by non-negative matrix factorization with smoothness/sparseness constraints , 2014, EURASIP J. Audio Speech Music. Process..

[34]  Emmanuel Vincent,et al.  Enforcing Harmonicity and Smoothness in Bayesian Non-Negative Matrix Factorization Applied to Polyphonic Music Transcription , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[35]  Bryan Pardo,et al.  Multi-pitch Streaming of Harmonic Sound Mixtures , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[36]  Bhiksha Raj,et al.  Non-negative Hidden Markov Modeling of Audio with Application to Source Separation , 2010, LVA/ICA.

[37]  Hirokazu Kameoka,et al.  Nonnegative Matrix Factorization with Markov-Chained Bases for Modeling Time-Varying Patterns in Music Spectrograms , 2010, LVA/ICA.

[38]  Bryan Pardo,et al.  Online Score-Informed Source Separation with Adaptive Instrument Models , 2015 .

[39]  Tuomas Virtanen,et al.  Musical Instrument Sound Multi-Excitation Model for Non-Negative Spectrogram Factorization , 2011, IEEE Journal of Selected Topics in Signal Processing.

[40]  Seungjin Choi,et al.  Algorithms for orthogonal nonnegative matrix factorization , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[41]  Vipul Arora,et al.  Multiple F0 Estimation and Source Clustering of Polyphonic Music Audio Using PLCA and HMRFs , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[42]  Tuomas Virtanen,et al.  Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[43]  A. Bregman,et al.  Demonstrations of auditory scene analysis : the perceptual organization of sound , 1995 .

[44]  Nicola Orio,et al.  Alignment of Monophonic and Polyphonic Music to a Score , 2001, ICMC.

[45]  Xabier Jaureguiberry,et al.  Convolutive common audio signal extraction , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[46]  Bhiksha Raj,et al.  Supervised and Semi-supervised Separation of Sounds from Single-Channel Mixtures , 2007, ICA.

[47]  Antoine Liutkus,et al.  Kernel Additive Models for Source Separation , 2014, IEEE Transactions on Signal Processing.

[48]  Anssi Klapuri,et al.  Multiple fundamental frequency estimation based on harmonicity and spectral smoothness , 2003, IEEE Trans. Speech Audio Process..

[49]  Anssi Klapuri,et al.  Sound source separation in monaural music signals using excitation-filter model and em algorithm , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[50]  Bryan Pardo,et al.  A simple music/voice separation method based on the extraction of the repeating musical structure , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[51]  Gautham J. Mysore,et al.  Source Separation of Polyphonic Music with Interactive User-Feedback on a Piano Roll Display , 2013, ISMIR.

[52]  Changshui Zhang,et al.  Unsupervised Single-Channel Music Source Separation by Average Harmonic Structure Modeling , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[53]  Meinard Müller,et al.  Extracting singing voice from music recordings by cascading audio decomposition techniques , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[54]  Method for the subjective assessment of intermediate quality level of , 2014 .

[55]  Laurent Daudet,et al.  Does inharmonicity improve an NMF-based piano transcription model? , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[56]  Bryan Pardo,et al.  Soundprism: An Online System for Score-Informed Source Separation of Music Audio , 2011, IEEE Journal of Selected Topics in Signal Processing.

[57]  Gautham J. Mysore,et al.  Combining Modeling Of Singing Voice And Background Music For Automatic Separation Of Musical Mixtures , 2013, ISMIR.

[58]  Gaël Richard,et al.  A structured nonnegative matrix factorization for source separation , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[59]  M. Picheny,et al.  Comparison of Parametric Representation for Monosyllabic Word Recognition in Continuously Spoken Sentences , 2017 .

[60]  Anssi Klapuri,et al.  Missing template estimation for user-assisted music transcription , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[61]  Takuya Fujishima,et al.  Realtime Chord Recognition of Musical Sound: a System Using Common Lisp Music , 1999, ICMC.

[62]  Roland Badeau,et al.  NMF With Time–Frequency Activations to Model Nonstationary Audio Events , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[63]  Björn W. Schuller,et al.  Off-line refinement of audio-to-score alignment by observation template adaptation , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[64]  Minje Kim,et al.  Monaural Music Source Separation: Nonnegativity, Sparseness, and Shift-Invariance , 2006, ICA.

[65]  Matti Karjalainen,et al.  A computationally efficient multipitch analysis model , 2000, IEEE Trans. Speech Audio Process..

[66]  Antoine Liutkus,et al.  The 2018 Signal Separation Evaluation Campaign , 2018, LVA/ICA.

[67]  Hirokazu Kameoka,et al.  Constrained and regularized variants of non-negative matrix factorization incorporating music-specific constraints , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[68]  Nicola Orio,et al.  A Discrete Filter Bank Approach to Audio to Score Matching for Polyphonic Music , 2009, ISMIR.

[69]  Emmanuel Vincent,et al.  Multi-Channel Audio Source Separation Using Multiple Deformed References , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[70]  Paris Smaragdis,et al.  Separation by “humming”: User-guided sound extraction from monophonic mixtures , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[71]  Gautham J. Mysore,et al.  Fast and easy crowdsourced perceptual audio evaluation , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[72]  Rémi Gribonval,et al.  BSS_EVAL Toolbox User Guide -- Revision 2.0 , 2005 .

[73]  François Rigaud,et al.  An Automated Source Separation Technology and Its Practical Applications , 2016 .

[74]  Maximilian Bayer,et al.  Handbook For Sound Engineers , 2016 .

[75]  Derry Fitzgerald,et al.  The Good Vibrations Problem , 2013 .

[76]  Bhiksha Raj,et al.  A Probabilistic Latent Variable Model for Acoustic Modeling , 2006 .

[77]  Bhiksha Raj,et al.  Adobe Systems , 1998 .

[78]  Roland Badeau,et al.  Blind Harmonic Adaptive Decomposition applied to supervised source separation , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[79]  Lale Akarun,et al.  SVD-based polyphonic music transcription , 2012, 2012 20th Signal Processing and Communications Applications Conference (SIU).

[80]  Judith C. Brown Calculation of a constant Q spectral transform , 1991 .

[81]  Antoine Liutkus,et al.  Kernel Additive Modeling for interference reduction in multi-channel music recordings , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[82]  Xavier Rodet,et al.  An Improved Cepstral Method for Deconvolution of Source-Filter Systems with Discrete Spectra: Application to Musical Sound Signals , 1990, ICMC.

[83]  Tillman Weyde,et al.  Template Adaptation for Improving Automatic Music Transcription , 2014, ISMIR.

[84]  Alexey Ozerov,et al.  Text-Informed Audio Source Separation. Example-Based Approach Using Non-Negative Matrix Partial Co-Factorization , 2014, Journal of Signal Processing Systems.

[85]  Antoine Liutkus,et al.  A simple user interface system for recovering patterns repeating in time and frequency in mixtures of sounds , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[86]  Yi-Hsuan Yang,et al.  Combining Spectral and Temporal Representations for Multipitch Estimation of Polyphonic Music , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[87]  Xin Liu,et al.  A blind bandwidth extension method for audio signals based on phase space reconstruction , 2014, EURASIP J. Audio Speech Music. Process..

[88]  Bryan Pardo,et al.  Simultaneous Separation and Segmentation in Layered Music , 2016, ISMIR.

[89]  O. Cappé,et al.  Regularized estimation of cepstrum envelope from discrete frequency points , 1995, Proceedings of 1995 Workshop on Applications of Signal Processing to Audio and Accoustics.