Joint audio source separation and dereverberation based on multichannel factorial hidden Markov model

This paper proposes a unified approach for jointly solving underdetermined source separation, audio event detection and dereverberation of convolutive mixtures. For monaural source separation, one successful approach involves applying non-negative matrix factorization (NMF) to the magnitude spectrogram of a mixture signal, interpreted as a non-negative matrix. Several attempts have recently been made to extend this approach to a multichannel case in order to utilize the spatial correlation of the multichannel inputs as an additional clue for source separation. The multichannel NMF assumes that an observed signal is a mixture of a limited number of source signals each of which has a static power spectral density scaled by a time-varying amplitude. We have previously proposed an extension of this approach, in which the variations over time of the spectral density and the total power of each source is modeled by a hidden Markov model (HMM). This has allowed us to solve source activity detection and source separation simultaneously through model parameter inference. While this method was based on an anechoic mixing model, the aim of this paper is to further extend the above approach to deal with reverberation by incorporating an echoic mixing model into the generative model of observed signals. Through an experiment of underdetermined source separation under reverberant conditions, we confirmed that the proposed method provided a 9.61 dB improvement compared with the conventional method in terms of the signal-to-interference ratio.

[1]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[2]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[3]  Alexey Ozerov,et al.  Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Biing-Hwang Juang,et al.  Blind speech dereverberation with multi-channel linear prediction based on short time fourier transform representation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Hirokazu Kameoka,et al.  A unified approach for underdetermined blind signal separation and source activity detection by multichannel factorial hidden Markov models , 2014, INTERSPEECH.

[6]  Hirokazu Kameoka,et al.  Efficient algorithms for multichannel extensions of Itakura-Saito nonnegative matrix factorization , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Satoshi Nakamura,et al.  Acoustical Sound Database in Real Environments for Sound Scene Understanding and Hands-Free Speech Recognition , 2000, LREC.

[8]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  P. Smaragdis,et al.  Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[10]  Erkki Oja,et al.  Independent Component Analysis , 2001 .