Separating time-frequency sources from time-domain convolutive mixtures using non-negative matrix factorization

This paper addresses the problem of under-determined audio source separation in multichannel reverberant mixtures. We target a semiblind scenario assuming that the mixing filters are known. Source separation is performed from the time-domain mixture signals in order to accurately model the convolutive mixing process. The source signals are however modeled as latent variables in a time-frequency domain. In a previous paper we proposed to use the modified discrete cosine transform. The present paper generalizes the method to the use of the odd-frequency short-time Fourier transform. In this domain, the source coefficients are modeled as centered complex Gaussian random variables whose variances are structured by means of a non-negative matrix factorization model. The inference procedure relies on a variational expectation-maximization algorithm. In the experiments we discuss the choice of the source representation and we show that the proposed approach outperforms two methods from the literature.

[1]  Simon J. Godsill,et al.  Bayesian extensions to non-negative matrix factorisation for audio signal modelling , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Satoshi Nakamura,et al.  Acoustical Sound Database in Real Environments for Sound Scene Understanding and Hands-Free Speech Recognition , 2000, LREC.

[3]  Antoine Liutkus,et al.  Alpha-Stable Matrix Factorization , 2015, IEEE Signal Processing Letters.

[4]  Tülay Adali,et al.  Complex-Valued Signal Processing: The Proper Way to Deal With Impropriety , 2011, IEEE Transactions on Signal Processing.

[5]  Mark D. Plumbley,et al.  Probabilistic Modeling Paradigms for Audio Source Separation , 2010 .

[6]  R. Badeau Preservation of whiteness in spectral and time-frequency transforms of second order processes , 2016 .

[7]  Antoine Liutkus,et al.  Cauchy nonnegative matrix factorization , 2015, 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[8]  Masataka Goto,et al.  Student's T nonnegative matrix factorization and positive semidefinite tensor factorization for single-channel audio source separation , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Matthieu Kowalski,et al.  Low-Rank Time-Frequency Synthesis , 2014, NIPS.

[10]  Emmanuel Vincent,et al.  Subjective and Objective Quality Assessment of Audio Source Separation , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  J. L. Vernet Real signals fast Fourier transform: Storage capacity and step number reduction by means of an odd discrete Fourier transform , 1971 .

[12]  Roland Badeau,et al.  Multichannel Audio Source Separation With Probabilistic Reverberation Priors , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[13]  Rémi Gribonval,et al.  From Blind to Guided Audio Source Separation: How models and side information can improve the separation of sound , 2014, IEEE Signal Processing Magazine.

[14]  Alexey Ozerov,et al.  Multichannel nonnegative tensor factorization with structured constraints for user-guided audio source separation , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  G. Bongiovanni,et al.  One-dimensional and two-dimensional generalised discrete fourier transforms , 1976 .

[16]  Radu Horaud,et al.  Audio source separation based on convolutive transfer function and frequency-domain lasso optimization , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Henrique S. Malvar,et al.  Signal processing with lapped transforms , 1992 .

[18]  Mark D. Plumbley,et al.  Multichannel High-Resolution NMF for Modeling Convolutive Mixtures of Non-Stationary Signals in the Time-Frequency Domain , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[19]  Rémi Gribonval,et al.  Beyond the Narrowband Approximation: Wideband Convex Methods for Under-Determined Reverberant Audio Source Separation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  Alexey Ozerov,et al.  Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Gene H. Golub,et al.  Matrix computations , 1983 .

[23]  Lucas C. Parra,et al.  Convolutive blind separation of non-stationary sources , 2000, IEEE Trans. Speech Audio Process..

[24]  Roland Badeau,et al.  Multichannel audio source separation: Variational inference of time-frequency sources from time-domain observations , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[26]  Emmanuel Vincent,et al.  Improved Perceptual Metrics for the Evaluation of Audio Source Separation , 2012, LVA/ICA.

[27]  Michael Zibulevsky,et al.  Sparse Component Analysis , 2010 .

[28]  Israel Cohen,et al.  On Multiplicative Transfer Function Approximation in the Short-Time Fourier Transform Domain , 2007, IEEE Signal Processing Letters.