Unified approach for audio source separation with multichannel factorial HMM and DOA mixture model

We deal with the problems of blind source separation, dereverberation, audio event detection and direction-of-arrival (DOA) estimation. We previously proposed a generative model of multichannel signals called the multichannel facto rial hidden Markov model, which allows us to simultaneously solve these problems through a joint optimization problem formulation. In this approach, we modeled the spatial cor relation matrix of each source as a weighted sum of the spatial correlation matrices corresponding to all possible DOAs. However, it became clear through real environment experiments that the estimate of the spatial correlation matrix tended to deviate from the actual correlation matrix since the plane wave assumption does not hold due to reverber ation and noise components. To handle such deviations, we propose introducing a prior distribution over the spatial correlation matrices called the DOA mixture model instead of using the weighted sum model. The experiment showed that the proposed method provided 1.94 [dB] improvement compared with our previous method in terms of the the signal-to-distortion ratios of separated signals.

[1]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Shigeru Katagiri,et al.  ATR Japanese speech database as a tool of speech recognition and synthesis , 1990, Speech Commun..

[3]  Alexey Ozerov,et al.  Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Aapo Hyvärinen,et al.  Independent Component Analysis: Fast ICA by a fixed-point algorithm that maximizes non-Gaussianity , 2001 .

[5]  Hirokazu Kameoka,et al.  Blind Separation of Infinitely Many Sparse Sources , 2012, IWAENC.

[6]  P. Smaragdis,et al.  Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[7]  Satoshi Nakamura,et al.  Acoustical Sound Database in Real Environments for Sound Scene Understanding and Hands-Free Speech Recognition , 2000, LREC.

[8]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[9]  Hirokazu Kameoka,et al.  A unified approach for underdetermined blind signal separation and source activity detection by multichannel factorial hidden Markov models , 2014, INTERSPEECH.

[10]  Hirokazu Kameoka,et al.  Efficient algorithms for multichannel extensions of Itakura-Saito nonnegative matrix factorization , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Ieee Staff 2017 25th European Signal Processing Conference (EUSIPCO) , 2017 .

[12]  Hirokazu Kameoka,et al.  Joint audio source separation and dereverberation based on multichannel factorial hidden Markov model , 2014, 2014 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).

[13]  Hirokazu Kameoka,et al.  Unified approach for underdetermined BSS, VAD, dereverberation and DOA estimation with multichannel factorial HMM , 2014, 2014 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[14]  Tuomas Virtanen,et al.  Multichannel audio separation by direction of arrival based spatial covariance model and non-negative matrix factorization , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).