A General Modular Framework for Audio Source Separation

Most of audio source separation methods are developed for a particular scenario characterized by the number of sources and channels and the characteristics of the sources and the mixing process. In this paper we introduce a general modular audio source separation framework based on a library of flexible source models that enable the incorporation of prior knowledge about the characteristics of each source. First, this framework generalizes several existing audio source separation methods, while bringing a common formulation for them. Second, it allows to imagine and implement new efficient methods that were not yet reported in the literature. We first introduce the framework by describing the flexible model, explaining its generality, and summarizing our modular implementation using a Generalized Expectation-Maximization algorithm. Finally, we illustrate the above-mentioned capabilities of the framework by applying it in several new and existing configurations to different source separation scenarios.

[1]  Mark D. Plumbley,et al.  Probabilistic Modeling Paradigms for Audio Source Separation , 2010 .

[2]  Rémi Gribonval,et al.  Under-determined convolutive blind source separation using spatial covariance models , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Maurice Charbit,et al.  Factorial Scaled Hidden Markov Model for polyphonic audio representation and source separation , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[4]  Dinh-Tuan Pham,et al.  Blind separation of speech mixtures based on nonstationarity , 2003, Seventh International Symposium on Signal Processing and Its Applications, 2003. Proceedings..

[5]  Alexey Ozerov,et al.  Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Aapo Hyvärinen,et al.  Learning Natural Image Structure with a Horizontal Product Model , 2009, ICA.

[7]  Emmanuel Vincent,et al.  Adaptive Harmonic Spectral Decomposition for Multiple Pitch Estimation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Jean-François Cardoso,et al.  A Flexible Component Model for Precision ICA , 2007, ICA.

[9]  Francis Bach,et al.  Online algorithms for nonnegative matrix factorization with the Itakura-Saito divergence , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[10]  Derry Fitzgerald,et al.  Extended Nonnegative Tensor Factorisation Models for Musical Sound Source Separation , 2008, Comput. Intell. Neurosci..

[11]  J. Cardoso,et al.  Maximum likelihood approach for blind audio source separation using time-frequency Gaussian source models , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[12]  Francesco Nesta,et al.  Cumulative State Coherence Transform for a Robust Two-Channel Multiple Source Localization , 2009, ICA.

[13]  Mark D. Plumbley,et al.  Polyphonic music transcription by non-negative sparse coding of power spectra , 2004 .

[14]  Mark D. Plumbley,et al.  Polyphonic transcription by non-negative sparse coding of power spectra , 2004, ISMIR.

[15]  Rémi Gribonval,et al.  A Robust Method to Count and Locate Audio Sources in a Multichannel Underdetermined Mixture , 2010, IEEE Transactions on Signal Processing.

[16]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[17]  Pierre Vandergheynst,et al.  Nonnegative matrix factorization and spatial covariance model for under-determined reverberant audio source separation , 2010, 10th International Conference on Information Science, Signal Processing and their Applications (ISSPA 2010).