The fundamental limitation of frequency domain blind source separation for convolutive mixtures of speech

Despite several recent proposals to achieve blind source separation (BSS) for realistic acoustic signals, the separation performance is still not good enough. In particular, when the impulse responses are long, performance is highly limited. In this paper, we consider a two-input, two-output convolutive BSS problem. First, we show that it is not good to be constrained by the condition T>P, where T is the frame length of the DFT and P is the length of the room impulse responses. We show that there is an optimum frame size that is determined by the trade-off between maintaining the number of samples in each frequency bin to estimate statistics and covering the whole reverberation. We also clarify the reason for the poor performance of BSS in long reverberant environments, highlighting that the framework of BSS works as two sets of frequency-domain adaptive beamformers. Although BSS can reduce reverberant sounds to some extent like adaptive beamformers, they mainly remove the sounds from the jammer direction. This is the reason for the difficulty of BSS in reverberant environments.

[1]  Kazuya Takeda,et al.  Evaluation of blind signal separation method using directivity pattern under reverberant conditions , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[2]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[3]  Te-Won Lee,et al.  Independent Component Analysis , 1998, Springer US.

[4]  J. Cardoso,et al.  Blind beamforming for non-gaussian signals , 1993 .

[5]  Shiro Ikeda,et al.  A METHOD OF ICA IN TIME-FREQUENCY DOMAIN , 2003 .

[6]  Kazuya Takeda,et al.  Blind source separation combining frequency-domain ICA and beamforming , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[7]  Dennis R. Morgan,et al.  Exploring permutation inconsistency in blind separation of speech signals in a reverberant environment , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[8]  Meir Feder,et al.  Multi-channel signal separation by decorrelation , 1993, IEEE Trans. Speech Audio Process..

[9]  Paris Smaragdis,et al.  Blind separation of convolved mixtures in the frequency domain , 1998, Neurocomputing.

[10]  Shoko Araki,et al.  Fundamental limitation of frequency domain blind source separation for convolutive mixture of speech , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[11]  Dirk Van Compernolle,et al.  Signal separation by symmetric adaptive decorrelation: stability, convergence, and uniqueness , 1995, IEEE Trans. Signal Process..

[12]  Shoko Araki,et al.  Separation and dereverberation performance of frequency domain blind source separation for speech in a reverberant environment , 2001, INTERSPEECH.

[13]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[14]  Reinhold Orglmeister,et al.  Blind source separation of real world signals , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[15]  Shoko Araki,et al.  Equivalence between frequency domain blind source separation and frequency domain adaptive null beamformers , 2001, INTERSPEECH.

[16]  Lucas C. Parra,et al.  Convolutive blind separation of non-stationary sources , 2000, IEEE Trans. Speech Audio Process..