Underdetermined blind separation and tracking of moving sources based ONDOA-HMM

This paper deals with the problem of the underdetermined blind separation and tracking of moving sources. In practical situations, sound sources such as human speakers can move freely and so blind separation algorithms must be designed to track the temporal changes of the impulse responses. We propose solving this problem through the posterior inference of the parameters in a generative model of an observed multichannel signal, formulated under the assumption of the sparsity of time-frequency components of speech and the continuity of speakers' movements. Specifically, we describe a generative model of mixture signals by incorporating a generative model of a time-varying frequency array response for each source, described using a path-restricted hidden Markov model (HMM). Each hidden state of the present HMM represents the direction of arrival (DOA) of each source, and so we call it a “DOA-HMM.” Through the posterior inference of the overall generative model, we can simultaneously track the DOAs of sources, separate source signals and perform permutation alignment. The experiment showed that the proposed algorithm provided a 6.20 dB improvement compared with the conventional method in terms of the signal-to-interference ratio.

[1]  Hiroshi Sawada,et al.  Underdetermined blind sparse source separation for arbitrarily arranged multiple sensors , 2007, Signal Process..

[2]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[3]  Hirokazu Kameoka,et al.  Bayesian Nonparametric Approach to Blind Separation of Infinitely Many Sparse Sources , 2013, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[4]  Erkki Oja,et al.  Independent Component Analysis , 2001 .

[5]  Shigeru Katagiri,et al.  ATR Japanese speech database as a tool of speech recognition and synthesis , 1990, Speech Commun..

[6]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Kiyohiro Shikano,et al.  Real-Time Implementation of Two-Stage Blind Source Separation Combining SIMO-ICA and Binary Masking , 2005 .

[8]  Shigeki Sagayama,et al.  Sparseness-Based 2CH BSS using the EM Algorithm in Reverberant Environment , 2007, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[9]  Satoshi Nakamura,et al.  Acoustical Sound Database in Real Environments for Sound Scene Understanding and Hands-Free Speech Recognition , 2000, LREC.

[10]  Hiroshi Sawada,et al.  Bayesian Unification of Sound Source Localization and Separation with Permutation Resolution , 2012, AAAI.

[11]  Thomas Hofmann,et al.  An EM Algorithm for Localizing Multiple Sound Sources in Reverberant Environments , 2007 .

[12]  Hiroshi Sawada,et al.  Underdetermined Convolutive Blind Source Separation via Frequency Bin-Wise Clustering and Permutation Alignment , 2011, IEEE Transactions on Audio, Speech, and Language Processing.