Unsupervised Approach to Music Source Separation using Generalized Dirichlet Prior

Music source separation aims to extract and reconstruct individual instrument sounds that constitute a mixture sound. It has received a great deal of attention recently due to its importance in the audio signal processing. In addition to its stand-alone applications such as noise reduction and instrument-wise equalization, the source separation can directly affect the performance of the various music information retrieval algorithms when used as a pre-processing. However, conventional source separation algorithms have failed to show satisfactory performance especially without the aid of spatial or musical information about the target source. To deal with this problem, we have focused on the spectral and temporal characteristics of sounds that can be observed in the spectrogram. Spectrogram decomposition is a commonly used technique to exploit such characteristics; however, only a few simple characteristics such as sparsity were utilizable so far because most of the characteristics were difficult to be expressed in the form of algorithms. The main goal of this thesis is to investigate the possibility of using generalized Dirichlet prior to constrain spectral/temporal bases of the spectrogram decomposition algorithms. As the generalized Dirichlet prior is not only simple but also flexible in its usage, it enables us to utilize more characteristics in the spectrogram decomposition frameworks. From harmonicpercussive sound separation to harmonic instrument sound separation, we apply the generalized Dirichlet prior to various tasks and verify its flexible usage as well as fine performance.

[1]  Eric Moulines,et al.  A blind source separation technique using second-order statistics , 1997, IEEE Trans. Signal Process..

[2]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[3]  Asoke K. Nandi,et al.  Blind Source Separation , 1999 .

[4]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[5]  Thomas Quatieri,et al.  Discrete-Time Speech Signal Processing: Principles and Practice , 2001 .

[6]  Masataka Goto,et al.  RWC Music Database: Music genre database and musical instrument sound database , 2003, ISMIR.

[7]  Christian Uhle,et al.  EXTRACTION OF DRUM TRACKS FROM POLYPHONIC MUSIC USING INDEPENDENT SUBSPACE ANALYSIS , 2003 .

[8]  Arnaud Delorme,et al.  EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis , 2004, Journal of Neuroscience Methods.

[9]  Gaël Richard,et al.  Drum Track Transcription of Polyphonic Music Using Noise Subspace Projection , 2005, ISMIR.

[10]  Mahesh Viswanathan,et al.  Measuring speech quality for text-to-speech systems: development and assessment of a modified mean opinion score (MOS) scale , 2005, Comput. Speech Lang..

[11]  Tuomas Virtanen,et al.  Separation of drums from polyphonic music using non-negative matrix factorization and support vector machine , 2005, 2005 13th European Signal Processing Conference.

[12]  Daniel W. C. Ho,et al.  Underdetermined blind source separation based on sparse representation , 2006, IEEE Transactions on Signal Processing.

[13]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Andrzej Cichocki,et al.  New Algorithms for Non-Negative Matrix Factorization in Applications to Blind Source Separation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[15]  Christopher J. James,et al.  On Semi-Blind Source Separation Using Spatial Constraints With Applications in EEG Analysis , 2006, IEEE Transactions on Biomedical Engineering.

[16]  M.E. Davies,et al.  Source separation using single channel ICA , 2007, Signal Process..

[17]  Bhiksha Raj,et al.  Supervised and Semi-supervised Separation of Sounds from Single-Channel Mixtures , 2007, ICA.

[18]  Te-Won Lee,et al.  Blind Source Separation Exploiting Higher-Order Frequency Dependencies , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Walter Kellermann,et al.  Multi-Channel Source Separation Preserving Spatial Information , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[20]  Tuomas Virtanen,et al.  Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Bhiksha Raj,et al.  Probabilistic Latent Variable Models as Nonnegative Factorizations , 2008, Comput. Intell. Neurosci..

[22]  Gaël Richard,et al.  Transcription and Separation of Drum Signals From Polyphonic Music , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  Derry Fitzgerald,et al.  Extended Nonnegative Tensor Factorisation Models for Musical Sound Source Separation , 2008, Comput. Intell. Neurosci..

[24]  Hirokazu Kameoka,et al.  A Real-time Equalizer of Harmonic and Percussive Components in Music Signals , 2008, ISMIR.

[25]  Hirokazu Kameoka,et al.  Separation of a monaural audio signal into harmonic/percussive components by complementary diffusion on spectrogram , 2008, 2008 16th European Signal Processing Conference.

[26]  Lucas C. Parra,et al.  Convolutive Blind Source Separation Methods , 2008 .

[27]  B. Shinn-Cunningham,et al.  Latent variable framework for modeling and separating single-channel acoustic sources , 2008 .

[28]  Chris H. Q. Ding,et al.  On the equivalence between Non-negative Matrix Factorization and Probabilistic Latent Semantic Indexing , 2008, Comput. Stat. Data Anal..

[29]  Anssi Klapuri,et al.  Musical Instrument Recognition in Polyphonic Audio Using Source-Filter Model for Sound Separation , 2009, ISMIR.

[30]  Paris Smaragdis,et al.  Separation by “humming”: User-guided sound extraction from monophonic mixtures , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[31]  Volker Gnann SOURCE-FILTER BASED CLUSTERING FOR MONAURAL BLIND SOURCE SEPARATION , 2009 .

[32]  Scott T. Rickard,et al.  Comparing Measures of Sparsity , 2008, IEEE Transactions on Information Theory.

[33]  Bhiksha Raj,et al.  A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds , 2009, NIPS.

[34]  Derry Fitzgerald,et al.  Harmonic/Percussive Separation Using Median Filtering , 2010 .

[35]  Shigeki Sagayama,et al.  HMM-based approach for automatic chord detection using refined acoustic features , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[36]  Anssi Klapuri,et al.  Sound source separation in monaural music signals using excitation-filter model and em algorithm , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[37]  Derry Fitzgerald,et al.  Single Channel Vocal Separation using Median Filtering and Factorisation Techniques , 2010 .

[38]  Shigeki Sagayama,et al.  Melody line estimation in homophonic music audio signals based on temporal-variability of melodic source , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[39]  Alexey Ozerov,et al.  Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[40]  Laurent Girin,et al.  A Watermarking-Based Method for Informed Source Separation of Audio Signals With a Single Sensor , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[41]  Emmanuel Vincent,et al.  Subjective and Objective Quality Assessment of Audio Source Separation , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[42]  Dan Barry,et al.  Clustering NMF basis functions using Shifted NMF for monaural sound source separation , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[43]  Bryan Pardo,et al.  Soundprism: An Online System for Score-Informed Source Separation of Music Audio , 2011, IEEE Journal of Selected Topics in Signal Processing.

[44]  Antoine Liutkus,et al.  Informed source separation: Source coding meets source separation , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[45]  Emad M. Grais,et al.  Single channel speech music separation using nonnegative matrix factorization and spectral masks , 2011, 2011 17th International Conference on Digital Signal Processing (DSP).

[46]  Hiroshi Sawada,et al.  Underdetermined Convolutive Blind Source Separation via Frequency Bin-Wise Clustering and Permutation Alignment , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[47]  Rémi Gribonval,et al.  Multichannel harmonic and percussive component separation by joint modeling of spatial and spectral continuity , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[48]  Balaji Thoshkahna,et al.  A Postprocessing Technique for Improved Harmonic/Percussion Separation for Polyphonic Music , 2011, ISMIR.

[49]  Minje Kim,et al.  Nonnegative Matrix Partial Co-Factorization for Spectral and Temporal Drum Source Separation , 2011, IEEE Journal of Selected Topics in Signal Processing.

[50]  S. Rickard,et al.  Shifted NMF Using an Efficient Constant-Q Transform for Monaural Sound Source Separation , 2011 .

[51]  Hugo Vélez-Pérez,et al.  Blind source separation, wavelet denoising and discriminant analysis for EEG artefacts and noise cancelling , 2012, Biomed. Signal Process. Control..

[52]  Antoine Liutkus,et al.  Informed source separation through spectrogram coding and data embedding , 2012, Signal Process..

[53]  Jean-Philippe Thiran,et al.  Musical Audio Source Separation Based on User-Selected F0 Track , 2012, LVA/ICA.

[54]  Francis R. Bach,et al.  Semi-supervised NMF with Time-frequency Annotations for Single-channel Source Separation , 2012, ISMIR.

[55]  Emmanuel Vincent,et al.  A General Flexible Framework for the Handling of Prior Information in Audio Source Separation , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[56]  Βασίλης Κατσούρος,et al.  Deploying Nonlinear Image Filters to Spectrogram for Harmonic/Percussive Separation , 2012 .

[57]  Meinard Müller,et al.  Using score-informed constraints for NMF-based source separation , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[58]  Nicolás Ruiz-Reyes,et al.  Multiple Instrument Mixtures Source Separation Evaluation Using Instrument-Dependent NMF Models , 2012, LVA/ICA.

[59]  Hirokazu Kameoka,et al.  Comparative evaluations of various harmonic/percussive sound separation algorithms based on anisotropic continuity of spectrogram , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[60]  Ruijiang Li,et al.  Multi-Stage Non-Negative Matrix Factorization for Monaural Singing Voice Separation , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[61]  Toni Heittola,et al.  Modified Group Delay Feature for Musical Instrument Recognition , 2013 .

[62]  Pierre Hellier,et al.  A scalable framework for joint clustering and synchronizing multi-camera videos , 2013, 21st European Signal Processing Conference (EUSIPCO 2013).

[63]  Paris Smaragdis,et al.  Manifold Preserving Hierarchical Topic Models for Quantization and Approximation , 2013, ICML.

[64]  Gautham J. Mysore,et al.  Universal speech models for speaker independent single channel source separation , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[65]  Marc Moonen,et al.  A harmonic/percussive sound separation based music pre-processing scheme for cochlear implant users , 2013, 21st European Signal Processing Conference (EUSIPCO 2013).

[66]  Mark D. Plumbley,et al.  Score informed audio source separation using constrained nonnegative matrix factorization and score synthesis , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[67]  Meinard Müller,et al.  Improving Time-Scale Modification of Music Signals Using Harmonic-Percussive Separation , 2014, IEEE Signal Processing Letters.

[68]  L. Daudet,et al.  Harmonic/percussive separation using Kernel Additive Modelling , 2014 .

[69]  Christian Rohlfing,et al.  NMF with spectral and temporal continuity criteria for monaural sound source separation , 2014, 2014 22nd European Signal Processing Conference (EUSIPCO).

[70]  Xabier Jaureguiberry,et al.  The Flexible Audio Source Separation Toolbox Version 2.0 , 2014, ICASSP 2014.

[71]  Kyogu Lee,et al.  Vocal Separation from Monaural Music Using Temporal/Spectral Continuity and Sparsity Constraints , 2014, IEEE Signal Processing Letters.

[72]  Kyogu Lee,et al.  SEPARATION OF MONOPHONIC MUSIC SIGNAL BASED ON USER-GUIDED ONSET INFORMATION , 2014 .

[73]  Nicolás Ruiz-Reyes,et al.  Percussive/harmonic sound separation by non-negative matrix factorization with smoothness/sparseness constraints , 2014, EURASIP J. Audio Speech Music. Process..

[74]  Gautham J. Mysore,et al.  ISSE: an interactive source separation editor , 2014, CHI.

[75]  Sascha Disch,et al.  Extending Harmonic-Percussive Separation of Audio Signals , 2014, ISMIR.

[76]  Shigeki Sagayama,et al.  Singing Voice Enhancement in Monaural Music Signals Based on Two-stage Harmonic/Percussive Sound Separation on Multiple Resolution Spectrograms , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[77]  Jonathan Le Roux,et al.  Discriminative NMF and its application to single-channel source separation , 2014, INTERSPEECH.

[78]  Rémi Gribonval,et al.  From Blind to Guided Audio Source Separation: How models and side information can improve the separation of sound , 2014, IEEE Signal Processing Magazine.

[79]  Mark D. Plumbley,et al.  Score-Informed Source Separation for Musical Audio Recordings: An overview , 2014, IEEE Signal Processing Magazine.

[80]  Saïd Moussaoui,et al.  Source Separation in Chemical Analysis : Recent achievements and perspectives , 2014, IEEE Signal Processing Magazine.

[81]  Arie Yeredor,et al.  Source Separation and Applications [From the Guest Editors] , 2014 .

[82]  Paris Smaragdis,et al.  Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[83]  A. Friberg,et al.  Modeling the perception of tempo. , 2015, The Journal of the Acoustical Society of America.

[84]  Nam Soo Kim,et al.  NMF-based Target Source Separation Using Deep Neural Network , 2015, IEEE Signal Processing Letters.

[85]  Kyogu Lee,et al.  Harmonic-Percussive Source Separation Using Harmonicity and Sparsity Constraints , 2015, ISMIR.

[86]  Meinard Müller,et al.  A Review of Time-Scale Modification of Music Signals † , 2016 .

[87]  Massimo Airoldi,et al.  Follow the algorithm: an exploratory investigation of music on YouTube , 2016 .

[88]  Takahiro Kawamura,et al.  Linked Data Collection and Analysis Platform for Music Information Retrieval , 2016, JIST.

[89]  Matthias Nussbaum,et al.  Advanced Digital Signal Processing And Noise Reduction , 2016 .

[90]  Bhiksha Raj,et al.  Supervised monaural source separation based on autoencoders , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[91]  Mark D. Plumbley,et al.  Single channel audio source separation using convolutional denoising autoencoders , 2017, 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[92]  Jae-Hun Kim,et al.  Deep Convolutional Neural Networks for Predominant Instrument Recognition in Polyphonic Music , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[93]  Kyogu Lee,et al.  Rare Sound Event Detection Using 1D Convolutional Recurrent Neural Networks , 2017, DCASE.

[94]  Kyogu Lee,et al.  Exploiting Continuity/Discontinuity of Basis Vectors in Spectrogram Decomposition for Harmonic-Percussive Sound Separation , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[95]  Mark B. Sandler,et al.  Towards Music Structural Segmentation across Genres , 2017, ACM Trans. Intell. Syst. Technol..

[96]  Wootaek Lim,et al.  Harmonic and percussive source separation using a convolutional auto encoder , 2017, 2017 25th European Signal Processing Conference (EUSIPCO).

[97]  Antoine Liutkus,et al.  The 2018 Signal Separation Evaluation Campaign , 2018, LVA/ICA.