Multichannel audio separation by direction of arrival based spatial covariance model and non-negative matrix factorization

This paper studies multichannel audio separation using non-negative matrix factorization (NMF) combined with a new model for spatial covariance matrices (SCM). The proposed model for SCMs is parameterized by source direction of arrival (DoA) and its parameters can be optimized to yield a spatially coherent solution over frequencies thus avoiding permutation ambiguity and spatial aliasing. The model constrains the estimation of SCMs to a set of geometrically possible solutions. Additionally we present a method for using a priori DoA information of the sources extracted blindly from the mixture for the initialization of the parameters of the proposed model. The simulations show that the proposed algorithm exceeds the separation quality of existing spatial separation methods.

[1]  Tuomas Virtanen,et al.  Modelling non-stationary noise with spectral factorisation in automatic speech recognition , 2013, Comput. Speech Lang..

[2]  Hirokazu Kameoka,et al.  Multichannel Extensions of Non-Negative Matrix Factorization With Complex-Valued Data , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Emmanuel Vincent,et al.  First Stereo Audio Source Separation Evaluation Campaign: Data, Algorithms and Results , 2007, ICA.

[4]  Francesco Nesta,et al.  Convolutive Underdetermined Source Separation through Weighted Interleaved ICA and Spatio-temporal Source Correlation , 2012, LVA/ICA.

[5]  Masataka Goto,et al.  RWC Music Database: Music genre database and musical instrument sound database , 2003, ISMIR.

[6]  Hiroshi Sawada,et al.  Grouping Separated Frequency Components by Estimating Propagation Model Parameters in Frequency-Domain Blind Source Separation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Jürgen Herre,et al.  Interactive Teleconferencing Combining Spatial Audio Object Coding and DirAC Technology , 2010 .

[8]  Tuomas Virtanen,et al.  Ieee Transactions on Audio, Speech and Language Processing Direction of Arrival Based Spatial Covariance Model for Blind Sound Source Separation , 2022 .

[9]  Hirokazu Kameoka,et al.  New formulations and efficient algorithms for multichannel NMF , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[10]  Pierre Vandergheynst,et al.  Nonnegative matrix factorization and spatial covariance model for under-determined reverberant audio source separation , 2010, 10th International Conference on Information Science, Signal Processing and their Applications (ISSPA 2010).

[11]  P. Svaizer,et al.  Multiple TDOA estimation by using a state coherence transform for solving the permutation problem in frequency-domain BSS , 2008, 2008 IEEE Workshop on Machine Learning for Signal Processing.

[12]  Alexey Ozerov,et al.  Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Rémi Gribonval,et al.  Under-Determined Reverberant Audio Source Separation Using a Full-Rank Spatial Covariance Model , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Ivan Tashev,et al.  Sound Capture and Processing: Practical Approaches , 2009 .