On-line Multichannel Estimation of Source Spectral Dominance

Despite its popularity, multichannel source demixing is intrinsically limited in real-world applications due to the model mismatch between the convolutive mixing model and the actual recordings. Varying number of sources, reverberation, diffuseness and spatial changes are common uncertainties that need to be handled. Post-processing is commonly adopted to compensate for these mismatches, generally in the form of non-linear spectral filtering. In this work we analyze the property of the normalized differences between the output magnitudes of a linear spatial filter. We show that thanks to the time-frequency sparsity of acoustic signals, such distributions can be approximatively modeled by a bimodal Gaussian mixture model. An on-line bimodal constrained GMM fitting is proposed, in order to estimate the posterior probability of source spectral dominance. It is shown that the estimated posteriors can be used to produce a filtered output with very low distortion, outperforming traditional non-linear methods.

[1]  Hiroshi Sawada,et al.  Stereo Source Separation and Source Counting with MAP Estimation with Dirichlet Prior Considering Spatial Aliasing Problem , 2009, ICA.

[2]  Philipos C. Loizou,et al.  Reasons why Current Speech-Enhancement Algorithms do not Improve Speech Intelligibility and Suggested Solutions , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  DeLiang Wang,et al.  Time-Frequency Masking for Speech Separation and Its Potential for Hearing Aid Design , 2008 .

[4]  Francesco Nesta,et al.  Blind source extraction for robust speech recognition in multisource noisy environments , 2013, Comput. Speech Lang..

[5]  Jianwu Dang,et al.  Voice Activity Detection Based on an Unsupervised Learning Framework , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Lucas C. Parra,et al.  A SURVEY OF CONVOLUTIVE BLIND SOURCE SEPARATION METHODS , 2007 .

[7]  Francesco Nesta,et al.  A FLEXIBLE SPATIAL BLIND SOURCE EXTRACTION FRAMEWORK FOR ROBUST SPEECH RECOGNITION IN NOISY ENVIRONMENTS , 2013 .

[8]  Ivan Tashev,et al.  Unified framework for single channel speech enhancement , 2009, 2009 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing.

[9]  Francesco Nesta,et al.  Semi-Blind Noise Extraction Using Partially Known Position of the Target Source , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Arun Ross,et al.  Microphone Arrays , 2009, Encyclopedia of Biometrics.

[11]  K. Matsuoka,et al.  Minimal distortion principle for blind source separation , 2002, Proceedings of the 41st SICE Annual Conference. SICE 2002..

[12]  Kiyohiro Shikano,et al.  Real-Time Implementation of Two-Stage Blind Source Separation Combining SIMO-ICA and Binary Masking , 2005 .