Scalable audio separation with light Kernel Additive Modelling

Recently, Kernel Additive Modelling (KAM) was proposed as a unified framework to achieve multichannel audio source separation. Its main feature is to use kernel models for locally describing the spectrograms of the sources. Such kernels can capture source features such as repetitivity, stability over time and/or frequency, self-similarity, etc. KAM notably subsumes many popular and effective methods from the state of the art, including REPET and harmonic/percussive separation with median filters. However, it also comes with an important drawback in its initial form: its memory usage badly scales with the number of sources. Indeed, KAM requires the storage of the full-resolution spectrogram for each source, which may become prohibitive for full-length tracks or many sources. In this paper, we show how it can be combined with a fast compression algorithm of its parameters to address the scalability issue, thus enabling its use on small platforms or mobile devices.

[1]  Antoine Liutkus,et al.  Informed source separation through spectrogram coding and data embedding , 2012, Signal Process..

[2]  Antoine Liutkus,et al.  Adaptive filtering for music/voice separation exploiting the repeating musical structure , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  S. Godsill,et al.  Prior Structures for Time-Frequency Energy Distributions , 2007, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[4]  Antoine Liutkus,et al.  Kernel Additive Models for Source Separation , 2014, IEEE Transactions on Signal Processing.

[5]  Harri Valpola,et al.  Denoising Source Separation , 2005, J. Mach. Learn. Res..

[6]  Bryan Pardo,et al.  A simple music/voice separation method based on the extraction of the repeating musical structure , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Seungjin Choi,et al.  Independent Component Analysis , 2009, Handbook of Natural Computing.

[8]  Antoine Liutkus,et al.  Kernel spectrogram models for source separation , 2014, 2014 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA).

[9]  Antoine Liutkus,et al.  Coding-Based Informed Source Separation: Nonnegative Tensor Factorization Approach , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  L. Daudet,et al.  Harmonic/percussive separation using Kernel Additive Modelling , 2014 .

[11]  Tuomas Virtanen,et al.  Multichannel audio upmixing based on non-negative tensor factorization representation , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[12]  Fabian J. Theis,et al.  The signal separation evaluation campaign (2007-2010): Achievements and remaining challenges , 2012, Signal Process..

[13]  Antoine Liutkus,et al.  Gaussian Processes for Underdetermined Source Separation , 2011, IEEE Transactions on Signal Processing.

[14]  Andrzej Cichocki,et al.  Adaptive multichannel blind deconvolution using state-space models , 1999, Proceedings of the IEEE Signal Processing Workshop on Higher-Order Statistics. SPW-HOS '99.

[15]  Ehud Weinstein,et al.  Parameter estimation of superimposed signals using the EM algorithm , 1988, IEEE Trans. Acoust. Speech Signal Process..

[16]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[17]  P. Smaragdis,et al.  Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[18]  Bryan Pardo,et al.  REpeating Pattern Extraction Technique (REPET): A Simple Method for Music/Voice Separation , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Tuomas Virtanen,et al.  Object-Based Audio Coding Using Non-Negative Matrix Factorization for the Spectrogram Representation , 2010 .

[20]  Rémi Gribonval,et al.  Sparse Representations in Audio and Music: From Coding to Source Separation , 2010, Proceedings of the IEEE.

[21]  Ali Taylan Cemgil,et al.  Score guided audio restoration via generalised coupled tensor factorisation , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  Emmanuel Vincent,et al.  A General Flexible Framework for the Handling of Prior Information in Audio Source Separation , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  Antoine Liutkus,et al.  An overview of informed audio source separation , 2013, 2013 14th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS).

[24]  Rémi Gribonval,et al.  Audio source separation with a single sensor , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  Nicolas Sturmel,et al.  DReaM: A Novel System for Joint Source Separation and Multi-Track Coding , 2012 .

[26]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[27]  Derry Fitzgerald,et al.  Harmonic/Percussive Separation Using Median Filtering , 2010 .

[28]  Alexey Ozerov,et al.  Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[29]  Rémi Gribonval,et al.  Under-Determined Reverberant Audio Source Separation Using a Full-Rank Spatial Covariance Model , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  Derry Fitzgerald,et al.  ON THE USE OF MASKING FILTERS IN SOUND SOURCE SEPARATION , 2012 .

[31]  Gaël Richard,et al.  A Musically Motivated Mid-Level Representation for Pitch Estimation and Musical Audio Source Separation , 2011, IEEE Journal of Selected Topics in Signal Processing.

[32]  B. Achiriloaie,et al.  VI REFERENCES , 1961 .

[33]  Rémi Gribonval,et al.  From Blind to Guided Audio Source Separation: How models and side information can improve the separation of sound , 2014, IEEE Signal Processing Magazine.

[34]  Mark D. Plumbley,et al.  Probabilistic Modeling Paradigms for Audio Source Separation , 2010 .

[35]  Eric Moulines,et al.  A blind source separation technique using second-order statistics , 1997, IEEE Trans. Signal Process..