论文信息 - Channel selection in the short-time modulation domain for distant speech recognition

Channel selection in the short-time modulation domain for distant speech recognition

Automatic speech recognition from multiple distant microphones poses significant challenges because of noise and reverberations. The quality of speech acquisition may vary between microphones because of movements of speakers and channel distortions. This paper proposes a channel selection approach for selecting reliable channels based on selection criterion operating in the short-term modulation spectrum domain. The proposed approach quantifies the relative strength of speech from each microphone and speech obtained from beamforming modulations. The new technique is compared experimentally in the real reverb conditions in terms of perceptual evaluation of speech quality (PESQ) measures and word error rate (WER). Overall improvement in recognition rate is observed using delay-sum and superdirective beamformers compared to the case when the channel is selected randomly using circular microphone arrays.

Petr Motlícek | Sridha Sridharan | Ivan Himawan | David Dean | Dian Tjondronegoro

[1] K. U. Simmer,et al. Multi-microphone noise reduction techniques as front-end devices for speech recognition , 2000, Speech Commun..

[2] Kamil K. Wójcicki,et al. Channel selection in the modulation domain for improved speech intelligibility in noise. , 2012, The Journal of the Acoustical Society of America.

[3] Jill Fain Lehman,et al. Channel selection based on multichannel cross-correlation coefficients for distant speech recognition , 2011, 2011 Joint Workshop on Hands-free Speech Communication and Microphone Arrays.

[4] I. McCowan,et al. The multi-channel Wall Street Journal audio visual corpus (MC-WSJ-AV): specification and initial experiments , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[5] Philipos C. Loizou,et al. Speech Enhancement: Theory and Practice , 2007 .

[6] Martin Wolf,et al. Channel selection measures for multi-microphone speech recognition , 2014, Speech Commun..

[7] Petr Motlícek,et al. Accent adaptation using Subspace Gaussian Mixture Models , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8] Oldooz Hazrati,et al. Tackling the combined effects of reverberation and masking noise using ideal channel selection. , 2012, Journal of speech, language, and hearing research : JSLHR.

[9] Sridha Sridharan,et al. Clustered Blind Beamforming From Ad-Hoc Microphone Arrays , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[10] Kuldip K. Paliwal,et al. Role of modulation magnitude and phase spectrum towards speech intelligibility , 2011, Speech Commun..

[11] Haizhou Li,et al. Normalization of the Speech Modulation Spectra for Robust Speech Recognition , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[12] Misha Pavel,et al. Intelligibility of speech with filtered time trajectories of spectral envelopes , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[13] John W. McDonough,et al. Multi-source far-distance microphone selection and combination for automatic transcription of lectures , 2006, INTERSPEECH.

[14] Martin Wolf,et al. On the potential of channel selection for recognition of reverberated speech with multiple microphones , 2010, INTERSPEECH.

[15] Bengt J. Borgstrom,et al. The linear prediction inverse modulation transfer function (LP-IMTF) filter for spectral enhancement, with applications to speaker recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16] Harry L. Van Trees,et al. Optimum Array Processing: Part IV of Detection, Estimation, and Modulation Theory , 2002 .

[17] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .

[18] Kazuya Takeda,et al. Speech recognition based on space diversity using distributed multi-microphone , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[19] Tomohiro Nakatani,et al. Making Machines Understand Us in Reverberant Rooms: Robustness Against Reverberation for Automatic Speech Recognition , 2012, IEEE Signal Process. Mag..

[20] R. Plomp,et al. Effect of reducing slow temporal modulations on speech reception. , 1994, The Journal of the Acoustical Society of America.