Underdetermined convolutive blind separation of sources integrating tensor factorization and expectation maximization

Abstract Underdetermined convolutive blind separation of sources is a challenging topic in speech and audio processing. In plenty of conventional separation algorithms, the source signal estimation is performed in the frequency-domain, thus leading to permutation problems and poor separation results. In this paper, to solve the permutation alignment and obtain better source separation results, we exploit the algebraic structure of tensor factorization model and expectation–maximization (EM) to provide a new time-frequency algorithm. This is because tensor factorization has an advantage to estimate the channel for the underdetermined convolutive mixture case, and EM algorithm is conducive to faster converge to the desired solution and better source separating property. In the proposed algorithm, we first estimate the mixing matrix by using tensor decomposition, while permutation alignment algorithm is used to deal with the permutation problems. Then, the model parameters are updated using EM algorithm for improving source separation performance. At the same time, the spatial images of source signals are obtained using Wiener filters constructed from the estimated parameters. Furthermore, the time-domain source signals can be obtained through inverse short-time Fourier transform. Finally, a series of simulations show that compared with some existing separation algorithms, the proposed algorithm achieves better separation performance.

[1]  Shengli Xie,et al.  Blind Spectral Unmixing Based on Sparse Nonnegative Matrix Factorization , 2011, IEEE Transactions on Image Processing.

[2]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[3]  Andrzej Cichocki,et al.  Nonnegative Matrix and Tensor Factorizations : An algorithmic perspective , 2014, IEEE Signal Processing Magazine.

[4]  Alexey Ozerov,et al.  Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Nicolas Gillis,et al.  Fast and Robust Recursive Algorithmsfor Separable Nonnegative Matrix Factorization , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Rémi Gribonval,et al.  Under-Determined Reverberant Audio Source Separation Using a Full-Rank Spatial Covariance Model , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Andrzej Cichocki,et al.  Group Component Analysis for Multiblock Data: Common and Individual Feature Extraction , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[9]  K. Matsuoka,et al.  Minimal distortion principle for blind source separation , 2002, Proceedings of the 41st SICE Annual Conference. SICE 2002..

[10]  Emanuël A. P. Habets,et al.  Blind Source Separation of Moving Sources Using Sparsity-Based Source Detection and Tracking , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[11]  Emmanuel Vincent,et al.  The 2008 Signal Separation Evaluation Campaign: A Community-Based Approach to Large-Scale Evaluation , 2009, ICA.

[12]  Yong Xiang,et al.  Time-Frequency Approach to Underdetermined Blind Source Separation , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[13]  Hiroshi Sawada,et al.  A robust and precise method for solving the permutation problem of frequency-domain blind source separation , 2004, IEEE Transactions on Speech and Audio Processing.

[14]  Pierre Comon,et al.  Enhanced Line Search: A Novel Method to Accelerate PARAFAC , 2008, SIAM J. Matrix Anal. Appl..

[15]  Alexey Ozerov,et al.  Multichannel nonnegative tensor factorization with structured constraints for user-guided audio source separation , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Fabian J. Theis,et al.  Sparse component analysis and blind source separation of underdetermined mixtures , 2005, IEEE Transactions on Neural Networks.

[17]  Hiroshi Sawada,et al.  Underdetermined Convolutive Blind Source Separation via Frequency Bin-Wise Clustering and Permutation Alignment , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Zhaoshui He,et al.  Convolutive Blind Source Separation in the Frequency Domain Based on Sparse Representation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Lin Wang,et al.  Multi-band multi-centroid clustering based permutation alignment for frequency-domain blind speech separation , 2014, Digit. Signal Process..

[20]  Radu Horaud,et al.  A Variational EM Algorithm for the Separation of Time-Varying Convolutive Audio Mixtures , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[21]  Chang D. Yoo,et al.  Underdetermined Convolutive BSS: Bayes Risk Minimization Based on a Mixture of Super-Gaussian Posterior Approximation , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[22]  Andrzej Cichocki,et al.  Tensor Decompositions for Signal Processing Applications: From two-way to multiway component analysis , 2014, IEEE Signal Processing Magazine.

[23]  Emmanuel Vincent,et al.  Complex Nonconvex l p Norm Minimization for Underdetermined Source Separation , 2007, ICA.

[24]  Wai Lok Woo,et al.  Underdetermined Convolutive Source Separation Using GEM-MU With Variational Approximated Optimum Model Order NMF2D , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[25]  Rémi Gribonval,et al.  Beyond the Narrowband Approximation: Wideband Convex Methods for Under-Determined Reverberant Audio Source Separation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[26]  Shengli Xie,et al.  Nonnegative Matrix Factorization Applied to Nonlinear Speech and Image Cryptosystems , 2008, IEEE Transactions on Circuits and Systems I: Regular Papers.

[27]  James P. Reilly,et al.  A frequency domain method for blind source separation of convolutive audio mixtures , 2005, IEEE Transactions on Speech and Audio Processing.

[28]  Hiroshi Sawada,et al.  Grouping Separated Frequency Components by Estimating Propagation Model Parameters in Frequency-Domain Blind Source Separation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[29]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[30]  Ignacio Santamaría Handbook of Blind Source Separation: Independent Component Analysis and Applications (Common, P. and Jutten, ; 2010 [Book Review] , 2013, IEEE Signal Processing Magazine.

[31]  Zhaoshui He,et al.  Symmetric Nonnegative Matrix Factorization: Algorithms and Applications to Probabilistic Clustering , 2011, IEEE Transactions on Neural Networks.

[32]  Francesco Nesta,et al.  Convolutive Underdetermined Source Separation through Weighted Interleaved ICA and Spatio-temporal Source Correlation , 2012, LVA/ICA.

[33]  Zongze Wu,et al.  Underdetermined Reverberant Audio-Source Separation Through Improved Expectation–Maximization Algorithm , 2019, Circuits Syst. Signal Process..

[34]  David E. Booth,et al.  Multi-Way Analysis: Applications in the Chemical Sciences , 2005, Technometrics.

[35]  Shengli Xie,et al.  Online Blind Source Separation Using Incremental Nonnegative Matrix Factorization With Volume Constraint , 2011, IEEE Transactions on Neural Networks.

[36]  Hirokazu Kameoka,et al.  Multichannel Signal Separation Combining Directional Clustering and Nonnegative Matrix Factorization with Spectrogram Restoration , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[37]  A. Back Independent Component Analysis , 2004 .

[38]  Radoslaw Mazur,et al.  A sparsity based criterion for solving the permutation ambiguity in convolutive blind source separation , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[39]  Junjie Yang,et al.  Underdetermined Blind Source Separation Combining Tensor Decomposition and Nonnegative Matrix Factorization , 2018, Symmetry.

[40]  Hirokazu Kameoka,et al.  Multichannel Extensions of Non-Negative Matrix Factorization With Complex-Valued Data , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[41]  Junbin Gao,et al.  Laplacian Regularized Low-Rank Representation and Its Applications , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Nikos D. Sidiropoulos,et al.  Batch and Adaptive PARAFAC-Based Blind Separation of Convolutive Speech Mixtures , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[43]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[44]  Xingyu Wang,et al.  Frequency Recognition in SSVEP-Based BCI using Multiset Canonical Correlation Analysis , 2013, Int. J. Neural Syst..

[45]  Lieven De Lathauwer,et al.  A Link between the Canonical Decomposition in Multilinear Algebra and Simultaneous Matrix Diagonalization , 2006, SIAM J. Matrix Anal. Appl..

[46]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[47]  Pierre Comon,et al.  Handbook of Blind Source Separation: Independent Component Analysis and Applications , 2010 .

[48]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[49]  Tuomas Virtanen,et al.  Ieee Transactions on Audio, Speech and Language Processing Direction of Arrival Based Spatial Covariance Model for Blind Sound Source Separation , 2022 .

[50]  Minje Kim,et al.  ICA-Based Clustering for Resolving Permutation Ambiguity in Frequency-Domain Convolutive Source Separation , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[51]  Tuomas Virtanen,et al.  Separation of Moving Sound Sources Using Multichannel NMF and Acoustic Tracking , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[52]  Nikos D. Sidiropoulos,et al.  Blind PARAFAC receivers for DS-CDMA systems , 2000, IEEE Trans. Signal Process..