Underdetermined Reverberant Audio-Source Separation Through Improved Expectation–Maximization Algorithm

Underdetermined reverberant audio-source separation is an important issue in speech and audio processing. To solve this problem, many separation algorithms have been proposed, in which model parameter estimation is performed in the time–frequency domain, leading to permutation ambiguity and poor separation performance. Additionally, in the existing expectation–maximization (EM) algorithms, one of the crucial problem is that updating the model parameters at each iterative step is time-consuming. In this paper, we present an improved EM algorithm that combines nonnegative matrix factorization (NMF) and time differences of arrival (TDOA) estimation, avoiding the time consumption by properly selecting initial values of the EM algorithm. In the proposed algorithm, NMF source model is used to avoid the permutation ambiguity problem, and acoustic localization can be achieved by transforming the TDOA. Then, model parameters are updated to obtain better separation results. Finally, the source signals are separated using Wiener filters. The experimental results show that compared with existing blind separation methods, the proposed algorithm achieves better performance on source separation.

[1]  Radu Horaud,et al.  A Variational EM Algorithm for the Separation of Time-Varying Convolutive Audio Mixtures , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[2]  Hiroshi Sawada,et al.  Spatio–Temporal FastICA Algorithms for the Blind Separation of Convolutive Mixtures , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[4]  James P. Reilly,et al.  A frequency domain method for blind source separation of convolutive audio mixtures , 2005, IEEE Transactions on Speech and Audio Processing.

[5]  Hiroshi Sawada,et al.  Underdetermined Convolutive Blind Source Separation via Frequency Bin-Wise Clustering and Permutation Alignment , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Ganesh R. Naik,et al.  Single-Channel EMG Classification With Ensemble-Empirical-Mode-Decomposition-Based ICA for Diagnosing Neuromuscular Disorders , 2016, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[7]  Ganesh R. Naik,et al.  Using Blind Source Separation on accelerometry data to analyze and distinguish the toe walking gait from normal gait in ITW children , 2014, Biomed. Signal Process. Control..

[8]  Pierre Comon,et al.  Handbook of Blind Source Separation: Independent Component Analysis and Applications , 2010 .

[9]  Hirokazu Kameoka,et al.  Determined Blind Source Separation Unifying Independent Vector Analysis and Nonnegative Matrix Factorization , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[10]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Liming Wang,et al.  Blind Deconvolution From Multiple Sparse Inputs , 2016, IEEE Signal Processing Letters.

[12]  Junjie Yang,et al.  Underdetermined Blind Source Separation Combining Tensor Decomposition and Nonnegative Matrix Factorization , 2018, Symmetry.

[13]  Rifai Chai,et al.  Driver Fatigue Classification With Independent Component by Entropy Rate Bound Minimization Analysis in an EEG-Based System , 2017, IEEE Journal of Biomedical and Health Informatics.

[14]  Hirokazu Kameoka,et al.  Multichannel Extensions of Non-Negative Matrix Factorization With Complex-Valued Data , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Chang D. Yoo,et al.  Underdetermined Convolutive BSS: Bayes Risk Minimization Based on a Mixture of Super-Gaussian Posterior Approximation , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[16]  Francesco Nesta,et al.  Convolutive Underdetermined Source Separation through Weighted Interleaved ICA and Spatio-temporal Source Correlation , 2012, LVA/ICA.

[17]  Alexey Ozerov,et al.  Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Rémi Gribonval,et al.  Under-Determined Reverberant Audio Source Separation Using a Full-Rank Spatial Covariance Model , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Ganesh R. Naik,et al.  Edge Effect Elimination in Single-Mixture Blind Source Separation , 2013, Circuits, Systems, and Signal Processing.

[20]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Emmanuel Vincent,et al.  Multi-source TDOA estimation in reverberant audio using angular spectra and clustering , 2012, Signal Process..

[22]  Alexey Ozerov,et al.  Multichannel nonnegative tensor factorization with structured constraints for user-guided audio source separation , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  Yuejie Chi,et al.  Guaranteed Blind Sparse Spikes Deconvolution via Lifting and Convex Optimization , 2015, IEEE Journal of Selected Topics in Signal Processing.

[24]  Hai-Lin,et al.  Blind identification of the underdetermined mixing matrix based on K-weighted hyperline clustering , 2015, Neurocomputing.

[25]  Yong Xiang,et al.  Time-Frequency Approach to Underdetermined Blind Source Separation , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[26]  G. Naik,et al.  Transradial Amputee Gesture Classification Using an Optimal Number of sEMG Sensors: An Approach Using ICA Clustering , 2016, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[27]  Tao Huang,et al.  Infrared spectrum blind deconvolution algorithm via learned dictionaries and sparse representation. , 2016, Applied optics.

[28]  Wai Lok Woo,et al.  Underdetermined Convolutive Source Separation Using GEM-MU With Variational Approximated Optimum Model Order NMF2D , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[29]  Ganesh R. Naik,et al.  Single channel blind source separation based local mean decomposition for Biomedical applications , 2013, 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[30]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.