Noise Robust Speech Recognition Using Multi-Channel Based Channel Selection And ChannelWeighting

In this paper, we study several microphone channel selection and weighting methods for robust automatic speech recognition (ASR) in noisy conditions. For channel selection, we investigate two methods based on the maximum likelihood (ML) criterion and minimum autoencoder reconstruction criterion, respectively. For channel weighting, we produce enhanced log Mel filterbank coefficients as a weighted sum of the coefficients of all channels. The weights of the channels are estimated by using the ML criterion with constraints. We evaluate the proposed methods on the CHiME-3 noisy ASR task. Experiments show that channel weighting significantly outperforms channel selection due to its higher flexibility. Furthermore, on real test data in which different channels have different gains of the target signal, the channel weighting method performs equally well or better than the MVDR beamforming, despite the fact that the channel weighting does not make use of the phase delay information which is normally used in beamforming.

[1]  Sridha Sridharan,et al.  Clustered Blind Beamforming From Ad-Hoc Microphone Arrays , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  John McDonough,et al.  Distant Speech Recognition , 2009 .

[3]  Sridha Sridharan,et al.  Multi-Channel Sub-Band Speech Recognition , 2001, EURASIP J. Adv. Signal Process..

[4]  Akihiko Sugiyama,et al.  A robust adaptive beamformer for microphone arrays with a blocking matrix using constrained adaptive filters , 1999, IEEE Trans. Signal Process..

[5]  An Ji,et al.  Multichannel speech recognition using distributed microphone signal fusion strategies , 2012, 2012 International Conference on Audio, Language and Image Processing.

[6]  Philip N. Garner,et al.  Study of Jacobian Normalization for VTLN , 2010 .

[7]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[8]  Tomohiro Nakatani,et al.  Making Machines Understand Us in Reverberant Rooms: Robustness Against Reverberation for Automatic Speech Recognition , 2012, IEEE Signal Process. Mag..

[9]  J. Nocedal,et al.  A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..

[10]  Yasuo Horiuchi,et al.  Reverberant speech recognition based on denoising autoencoder , 2013, INTERSPEECH.

[11]  B.D. Van Veen,et al.  Beamforming: a versatile approach to spatial filtering , 1988, IEEE ASSP Magazine.

[12]  J. Capon High-resolution frequency-wavenumber spectrum analysis , 1969 .

[13]  Jingdong Chen,et al.  Acoustic MIMO Signal Processing , 2006 .

[14]  Bo Ren,et al.  Robust speech recognition using beamforming with adaptive microphone gains and multichannel noise reduction , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[15]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[16]  Gibak Kim Speech distortion weighted multi-channel Wiener filter and its application to speech recognition , 2015, IEICE Electron. Express.

[17]  Jon Barker,et al.  The third ‘CHiME’ speech separation and recognition challenge: Dataset, task and baselines , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[18]  Jill Fain Lehman,et al.  Channel selection based on multichannel cross-correlation coefficients for distant speech recognition , 2011, 2011 Joint Workshop on Hands-free Speech Communication and Microphone Arrays.

[19]  Longbiao Wang,et al.  Distant-Talking Speech Recognition Based on Spectral Subtraction by Multi-Channel LMS Algorithm , 2011, IEICE Trans. Inf. Syst..

[20]  Yifan Gong,et al.  An Overview of Noise-Robust Automatic Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[21]  Iain McCowan,et al.  Robust speech recognition using near-field superdirective beamforming with post-filtering , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[22]  John W. McDonough,et al.  Multi-source far-distance microphone selection and combination for automatic transcription of lectures , 2006, INTERSPEECH.

[23]  Martin Wolf,et al.  On the potential of channel selection for recognition of reverberated speech with multiple microphones , 2010, INTERSPEECH.

[24]  Matthias Wölfel Channel selection by class separability measures for automatic transcriptions on distant microphones , 2007, INTERSPEECH.