Improving the Separation of Concurrent Speech through Residual Echo Suppression

This paper investigates the use of acoustic echo cancellation components in a speech separation system. The basic system uses a classical beamformer architecture, which separates the speech from different speakers based on spatial diversity. In order to get a better suppression of concurrent speech, we add a residual echo suppression stage, which has originally been developed in the area of acoustic echo cancellation. The speech separation performance of the proposed system is evaluated by means of automatic speech recognition experiments. The results show a clear improvement over standard beamforming and postfiltering approaches, with a word error rate of 44.2% compared to 68.1% for a superdirective beamformer (SDB) and 59.8% for an SDB with Zelinksi postfilter.

[1]  R. K. Cook,et al.  Measurement of Correlation Coefficients in Reverberant Sound Fields , 1955 .

[2]  Ivan Himawan,et al.  Microphone Array Beamforming Approach to Blind Speech Separation , 2007, MLMI.

[3]  John W. McDonough,et al.  Adaptive Beamforming With a Minimum Mutual Information Criterion , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Klaus Uwe Simmer,et al.  Superdirective Microphone Arrays , 2001, Microphone Arrays.

[5]  Rahil Mahdian Toroghi,et al.  Multi-channel speech separation with soft time-frequency masking , 2012, SAPA@INTERSPEECH.

[6]  Özgür Yilmaz,et al.  On the approximate W-disjoint orthogonality of speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Ivan Tashev,et al.  Sound Capture and Processing: Practical Approaches , 2009 .

[8]  Joerg Bitzer,et al.  Post-Filtering Techniques , 2001, Microphone Arrays.

[9]  I. McCowan,et al.  The multi-channel Wall Street Journal audio visual corpus (MC-WSJ-AV): specification and initial experiments , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[10]  R. Zelinski,et al.  A microphone array with adaptive post-filtering for noise reduction in reverberant rooms , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[11]  Rainer Martin,et al.  Unbiased residual echo power estimation for hands-free telephony , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Daniel Gatica-Perez,et al.  Speech Enhancement and Recognition in Meetings With an Audio–Visual Sensor Array , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  A. Wasiljeff,et al.  Adaptive Microphone Arrays for Noise Suppression in the Frequency Domain , 1992 .

[14]  Stefan Schacht,et al.  To separate speech: a system for recognizing simultaneous speech , 2007, ICML 2007.