Ieee Transactions on Audio, Speech and Language Processing Direction of Arrival Based Spatial Covariance Model for Blind Sound Source Separation

This paper addresses the problem of sound source separation from a multichannel microphone array capture via estimation of source spatial covariance matrix (SCM) of a short-time Fourier transformed mixture signal. In many conventional audio separation algorithms the source mixing parameter estimation is done separately for each frequency thus making them prone to errors and leading to suboptimal source estimates. In this paper we propose a SCM model which consists of a weighted sum of direction of arrival (DoA) kernels and estimate only the weights dependent on the source directions. In the proposed algorithm, the spatial properties of the sources become jointly optimized over all frequencies, leading to more coherent source estimates and mitigating the effect of spatial aliasing at high frequencies. The proposed SCM model is combined with a linear model for magnitudes and the parameter estimation is formulated in a complex-valued non-negative matrix factorization (CNMF) framework. Simulations consist of recordings done with a hand-held device sized array having multiple microphones embedded inside the device casing. Separation quality of the proposed algorithm is shown to exceed the performance of existing state of the art separation methods with two sources when evaluated by objective separation quality metrics.

[1]  Ville Pulkki,et al.  Spatial Sound Reproduction with Directional Audio Coding , 2007 .

[2]  Emmanuel Vincent,et al.  First Stereo Audio Source Separation Evaluation Campaign: Data, Algorithms and Results , 2007, ICA.

[3]  Jürgen Herre,et al.  Interactive Teleconferencing Combining Spatial Audio Object Coding and DirAC Technology , 2010 .

[4]  Xindong Wu,et al.  A new descriptive clustering algorithm based on Nonnegative Matrix Factorization , 2008, 2008 IEEE International Conference on Granular Computing.

[5]  Derry Fitzgerald,et al.  Sound Source Separation Using Shifted Non-Negative Tensor Factorisation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[6]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[7]  Hirokazu Kameoka,et al.  Formulations and algorithms for multichannel complex NMF , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Jerome Daniel,et al.  Further Investigations of High-Order Ambisonics and Wavefield Synthesis for Holophonic Sound Imaging , 2003 .

[9]  Ivan Tashev,et al.  Sound Capture and Processing: Practical Approaches , 2009 .

[10]  Tom Barker,et al.  Non-negative tensor factorisation of modulation spectrograms for monaural sound source separation , 2013, INTERSPEECH.

[11]  Özgür Yilmaz,et al.  Blind separation of disjoint orthogonal signals: demixing N sources from 2 mixtures , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[12]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[13]  Ehud Weinstein,et al.  Signal enhancement using beamforming and nonstationarity with applications to speech , 2001, IEEE Trans. Signal Process..

[14]  Anja Walter,et al.  Audio Signal Processing For Next Generation Multimedia Communication Systems , 2016 .

[15]  Juha Merimaa,et al.  Spatial Impulse Response Rendering I: Analysis and Synthesis , 2005 .

[16]  Emmanuel Vincent,et al.  Subjective and Objective Quality Assessment of Audio Source Separation , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Andrzej Cichocki,et al.  Nonnegative Matrix and Tensor Factorization T , 2007 .

[18]  Arun Ross,et al.  Microphone Arrays , 2009, Encyclopedia of Biometrics.

[19]  Hirokazu Kameoka,et al.  Multichannel Extensions of Non-Negative Matrix Factorization With Complex-Valued Data , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  Gary W. Elko,et al.  Spherical Microphone Arrays for 3D Sound Recording , 2004 .

[21]  Dennis R. Morgan,et al.  A beamforming approach to permutation alignment for multichannel frequency-domain blind speech separation , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[22]  Paris Smaragdis,et al.  Extraction of Speech from Mixture Signals , 2012, Techniques for Noise Robustness in Automatic Speech Recognition.

[23]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  Tuomas Virtanen,et al.  Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  E. Lehmann,et al.  Prediction of energy decay in room impulse responses simulated with an image-source model. , 2008, The Journal of the Acoustical Society of America.

[26]  Hirokazu Kameoka,et al.  New formulations and efficient algorithms for multichannel NMF , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[27]  Shoichi Koyama,et al.  Design of transform filter for reproducing arbitrarily shifted sound field using phase-shift of spatio-temporal frequency , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[28]  Alexey Ozerov,et al.  Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[29]  Rémi Gribonval,et al.  Under-Determined Reverberant Audio Source Separation Using a Full-Rank Spatial Covariance Model , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  Paris Smaragdis,et al.  Convolutive Speech Bases and Their Application to Supervised Speech Separation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[31]  Hiroshi Sawada,et al.  Underdetermined Convolutive Blind Source Separation via Frequency Bin-Wise Clustering and Permutation Alignment , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[32]  Pierre Vandergheynst,et al.  Nonnegative matrix factorization and spatial covariance model for under-determined reverberant audio source separation , 2010, 10th International Conference on Information Science, Signal Processing and their Applications (ISSPA 2010).

[33]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[34]  P. Svaizer,et al.  Multiple TDOA estimation by using a state coherence transform for solving the permutation problem in frequency-domain BSS , 2008, 2008 IEEE Workshop on Machine Learning for Signal Processing.

[35]  Ville Pulkki,et al.  Virtual Sound Source Positioning Using Vector Base Amplitude Panning , 1997 .

[36]  Axel Röbel,et al.  Sound source separation based on non-negative tensor factorization incorporating spatial cue as prior knowledge , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[37]  Tuomas Virtanen,et al.  Separation of drums from polyphonic music using non-negative matrix factorization and support vector machine , 2005, 2005 13th European Signal Processing Conference.

[38]  Gary W. Elko,et al.  A highly scalable spherical microphone array based on an orthonormal decomposition of the soundfield , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[39]  Hiroshi Sawada,et al.  Grouping Separated Frequency Components by Estimating Propagation Model Parameters in Frequency-Domain Blind Source Separation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[40]  O. Hoshuyama,et al.  A robust adaptive beamformer for microphone arrays with a blocking matrix using constrained adaptive filters , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[41]  Tuomas Virtanen,et al.  Permutation alignment of frequency-domain ICA by the maximization of intra-source envelope correlations , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[42]  Seungjin Choi,et al.  Independent Component Analysis , 2009, Handbook of Natural Computing.