论文信息 - AT wo-Channel Acoustic Front-End for Robust Automatic Speech Recognition in Noisy and Reverberant Environments

AT wo-Channel Acoustic Front-End for Robust Automatic Speech Recognition in Noisy and Reverberant Environments

An acoustic front-end for robust automatic speech recognition in noisy and reverberantenvironmentsis proposed in this contribution. It comprises a blind source separation-based signal extraction scheme and only requires two microphone signals. The proposed front-end and its integrationinto the recognitionsystem is analyzed and evaluated in noisy living room-like environments according to the PASCAL CHiME challenge. The results show that the introduced system significantly improves the recognition performance compared to the challenge baseline.

[1] Walter Kellermann,et al. Multidimensional localization of multiple sound sources using averaged directivity patterns of Blind Source Separation systems , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2] Heinrich Kuttruff,et al. Room acoustics , 1973 .

[3] Steve Young,et al. The HTK book , 1995 .

[4] Kiyohiro Shikano,et al. Blind Spatial Subtraction Array for Speech Enhancement in Noisy Environment , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[5] Walter Kellermann,et al. A generalization of blind source separation algorithms for convolutive mixtures based on second-order statistics , 2005, IEEE Transactions on Speech and Audio Processing.

[6] Li Deng,et al. Large-vocabulary speech recognition under adverse acoustic environments , 2000, INTERSPEECH.

[7] Mark J. F. Gales. Adaptive training for robust ASR , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[8] Ning Ma,et al. The PASCAL CHiME speech separation and recognition challenge , 2013, Comput. Speech Lang..

[9] Walter Kellermann,et al. BSS for improved interference estimation for Blind speech signal Extraction with two microphones , 2009, 2009 3rd IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).

[10] Walter Kellermann,et al. An acoustic front-end for interactive TV incorporating multichannel acoustic echo cancellation and blind signal extraction , 2010, 2010 Conference Record of the Forty Fourth Asilomar Conference on Signals, Systems and Computers.

[11] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[12] Walter Kellermann,et al. A GENERALIZATION OF A CLASS OF BLIND SOURCE SEPARATION ALGORITHMS FOR CONVOLUTIVE MIXTURES , 2003 .

[13] Wolfgang Herbordt,et al. Application of a double-talk resilient DFT domain adaptive filter for bin-wise stepsize controls to adaptive beamforming , 2005 .

[14] Jon Barker,et al. An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.

[15] W. Marsden. I and J , 2012 .

[16] Jacob Benesty,et al. Audio Signal Processing for Next-Generation Multimedia Communication Systems , 2004 .

[17] Akihiko Sugiyama,et al. A real time robust adaptive microphone array controlled by an SNR estimate , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[18] Ning Ma,et al. The CHiME corpus: a resource and a challenge for computational hearing in multisource environments , 2010, INTERSPEECH.

[19] Paul Lamere,et al. Sphinx-4: a flexible open source framework for speech recognition , 2004 .

[20] G. Carter,et al. The generalized correlation method for estimation of time delay , 1976 .

[21] Walter Kellermann,et al. Blind Source Separation for Convolutive Mixtures: A Unified Treatment , 2004 .