论文信息 - Speech recognition for machine translation in Quaero

Speech recognition for machine translation in Quaero

This paper describes the speech-to-text systems used to provide automatic transcriptions used in the Quaero 2010 evaluation of Machine Translation from speech. Quaero (www.quaero.org) is a large research and industrial innovation program focusing on technologies for automatic analysis and classification of multimedia and multilingual documents. The ASR transcript is the result of a Rover combination of systems from three teams ( KIT, RWTH, LIMSI+VR) for the French and German languages. The casesensitive word error rates (WER) of the combined systems were respectively 20.8% and 18.1% on the 2010 evaluation data, relative WER reductions of 14.6% and 17.4% respectively over the best component system.

[1] A. Waibel,et al. A one-pass decoder based on polymorphic linguistic context assignment , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[2] Jean-Luc Gauvain,et al. Training Neural Network Language Models on Very Large Corpora , 2005, HLT.

[3] Hermann Ney,et al. Hierarchical bottle neck features for LVCSR , 2010, INTERSPEECH.

[4] Markus Freitag,et al. Advances on spoken language translation in the Quaero program , 2011, IWSLT.

[5] Jean-Luc Gauvain,et al. Speech Processing for Audio Indexing , 2008, GoTAL.

[6] Frantisek Grézl,et al. Optimizing bottle-neck features for lvcsr , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7] Andreas Stolcke,et al. Using MLP features in SRI's conversational speech recognition system , 2005, INTERSPEECH.

[8] H Hermansky,et al. Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[9] Jean-Luc Gauvain,et al. The LIMSI Broadcast News transcription system , 2002, Speech Commun..

[10] Pavel Matejka,et al. Towards Lower Error Rates in Phoneme Recognition , 2004, TSD.

[11] Daniel Povey,et al. Minimum Phone Error and I-smoothing for improved discriminative training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12] Jonathan G. Fiscus,et al. A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[13] Jean-Luc Gauvain,et al. Partitioning and transcription of broadcast news data , 1998, ICSLP.

[14] Mark J. F. Gales. Semi-tied covariance matrices , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[15] Hermann Ney,et al. Audio segmentation for speech recognition using segment features , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16] Tanja Schultz,et al. SPICE: web-based tools for rapid language adaptation in speech processing systems , 2007, INTERSPEECH.

[17] Fabio Brugnara,et al. Adaptive training using simple target models [speech recognition applications] , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[18] Maxine Eskénazi,et al. BREF, a large vocabulary spoken corpus for French , 1991, EUROSPEECH.

[19] Tanja Schultz,et al. Globalphone: a multilingual speech and text database developed at karlsruhe university , 2002, INTERSPEECH.

[20] Holger Schwenk,et al. Continuous space language models , 2007, Comput. Speech Lang..

[21] Hermann Ney,et al. Joint-sequence models for grapheme-to-phoneme conversion , 2008, Speech Commun..

[22] Philip C. Woodland,et al. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[23] Gunnar Evermann,et al. Posterior probability decoding, confidence estimation and system combination , 2000 .

[24] Ngoc Thang Vu,et al. Rapid bootstrapping of five eastern european languages using the rapid language adaptation toolkit , 2010, INTERSPEECH.

[25] Jean-Luc Gauvain,et al. On the Use of MLP Features for Broadcast News Transcription , 2008, TSD.

[26] Andreas Stolcke,et al. SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.