论文信息 - Developing high performance asr in the IBM multilingual speech-to-speech translation system

Developing high performance asr in the IBM multilingual speech-to-speech translation system

This paper presents our recent development of the real-time speech recognition component in the IBM English/Iraqi Arabic speech-to-speech translation system for the DARPA Transtac project. We describe the details of the acoustic and language modeling that lead to high recognition accuracy and noise robustness and give the performance of the system on the evaluation sets of spontaneous conversational speech. We also introduce the streaming decoding structure and several speedup techniques that achieves best recognition accuracy at about 0.3 x RT recognition speed.

Wei Zhang | Xiaodong Cui | Liang Gu | Yuqing Gao | Bing Xiang

[1] Bowen Zhou,et al. Two-way speech-to-speech translation on handheld devices , 2004, INTERSPEECH.

[2] Geoffrey Zweig,et al. Anatomy of an extremely fast LVCSR decoder , 2005, INTERSPEECH.

[3] Michael Picheny,et al. Concept-Based Speech-to-Speech Translation Using Maximum Entropy Models for Statistical Natural Concept Generation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[4] Daniel Povey,et al. Minimum Phone Error and I-smoothing for improved discriminative training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5] Jing Huang,et al. Discriminatively trained features using fMPE for multi-stream audio-visual speech recognition , 2005, INTERSPEECH.

[6] Philip C. Woodland,et al. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[7] Ruhi Sarikaya,et al. IBM Mastor: Multilingual Automatic Speech-To-Speech Translator , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[8] Hakan Erdogan,et al. Incremental on-line feature space MLLR adaptation for telephony speech recognition , 2002, INTERSPEECH.