Robust voice activity detection in stereo recording with crosstalk

Crosstalk in a stereo recording occurs when the speech from one participant is leaked into the close-talking microphones of the other participants. This crosstalk causes degradation of the voice activity detection (VAD) performance on individual channels, in spite of the strength of the crosstalk signal being lower than that of the participant’s speech. To address this problem, we first detect speech using a standard VAD scheme on the merged signal obtained by adding the signals from two channels and then determine the target channel using a channel selection scheme. Although VAD is performed on a short-term frame basis, we found that the channel selection performance improves with long-term signal information. Experiments using stereo recordings of real conversations demonstrate that the VAD accuracy averaged over both channels improves by 22% (absolute) indicating the robustness of the proposed approach to crosstalk compared to the single channel VAD scheme.

[1]  R. Gray,et al.  Distortion measures for speech processing , 1980 .

[2]  References , 1971 .

[3]  Javier Ramírez,et al.  Efficient voice activity detection algorithms using long-term speech information , 2004, Speech Commun..

[4]  Guy J. Brown,et al.  Speech and crosstalk detection in multichannel audio , 2005, IEEE Transactions on Speech and Audio Processing.

[5]  Qi Tian,et al.  HMM-Based Audio Keyword Generation , 2004, PCM.

[6]  H. M. Chang "CrossTalk": technical challenge to VAD-like applications in mixed landline and mobile environments , 1996, Proceedings of IVTTA '96. Workshop on Interactive Voice Technology for Telecommunications Applications.

[7]  Tanja Schultz,et al.  Crosscorrelation-based multispeaker speech activity detection , 2004, INTERSPEECH.

[8]  Andreas Stolcke,et al.  Multispeaker speech activity detection for the ICSI meeting recorder , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[9]  I. Boyd,et al.  The voice activity detector for the Pan-European digital cellular mobile telephone service , 1988, International Conference on Acoustics, Speech, and Signal Processing,.

[10]  Daniel P. W. Ellis,et al.  Hidden Markov Model Based Speech Activity Detection for the ICSI Meeting Project , 2001 .