Maximum kurtosis beamforming with the generalized sidelobe canceller

This paper presents an adaptive beamforming application based on the capture of far-field speech data from a real single speaker in a real meeting room. After the position of a speaker is estimated by a speaker tracking system, we construct a subband-domain beamformer in generalized sidelobe canceller (GSC) configuration. In contrast to conventional practice, we then optimize the active weight vectors of the GSC so that kurtosis of output signals is maximized. Our beamforming algorithms can suppress noise and reverberation without the signal cancellation problems encountered in conventional beamforming algorithms. We demonstrate the effectiveness of our proposed techniques through a series of automatic speech recognition experiments on the Multi-Channel Wall Street Journal Audio Visual Corpus (MC-WSJ-AV). The beamforming algorithm proposed here achieved a 13.6% WER, whereas the simple delay-and-sum beamformer provided a WER of 17.8%.

[1]  Dietrich Klakow,et al.  Filter bank design based on minimization of individual aliasing terms for minimum mutual information subband adaptive beamforming , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[3]  Yannick Mahieux,et al.  Analysis of noise reduction and dereverberation techniques based on microphone arrays with postfiltering , 1998, IEEE Trans. Speech Audio Process..

[4]  Henrique S. Malvar,et al.  Speech dereverberation via maximum-kurtosis subband adaptive filtering , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[5]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[6]  Bernard Widrow,et al.  Signal cancellation phenomena in adaptive antennas: Causes and cures , 1982 .

[7]  Stefan Schacht,et al.  To separate speech: a system for recognizing simultaneous speech , 2007, ICML 2007.

[8]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[9]  M. Wolfel,et al.  Distant Speech Recognition: Bridging the Gaps , 2008, 2008 Hands-Free Speech Communication and Microphone Arrays.

[10]  John W. McDonough,et al.  Adaptive Beamforming With a Minimum Mutual Information Criterion , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Jesper Jensen,et al.  Minimum Mean-Square Error Estimation of Discrete Fourier Coefficients With Generalized Gamma Priors , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Philip N. Garner,et al.  Adaptive Beamforming with a Maximum Negentropy Criterion , 2008, 2008 Hands-Free Speech Communication and Microphone Arrays.

[13]  I. McCowan,et al.  The multi-channel Wall Street Journal audio visual corpus (MC-WSJ-AV): specification and initial experiments , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[14]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.