Open Domain Speech Translation: From Seminars and Speeches to Lectures

This paper describes our ongoing work in domain unlimited speech translation. We describe how we developed a lecture translation system by moving from speech translation of European Parliament Plenary Sessions and seminar talks to the open domain of lectures. We started with our speech recognition (ASR) and statistical machine translation (SMT) 2006 evaluation systems developed within the framework of TC-Star (Technology and Corpora for Speech to Speech Translation) and CHIL (Computers in the Human Interaction Loop). The paper presents the speech translation performance of these systems on lectures and gives an overview of our final real-time lecture translation system.

[1]  A. Waibel,et al.  A one-pass decoder based on polymorphic linguistic context assignment , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[2]  Martin Raab,et al.  The ISL TC-STAR Spring 2006 ASR Evaluation Systems , 2006 .

[3]  Hermann Ney,et al.  Cross domain automatic transcription on the TC-STAR EPPS corpus , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[4]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[5]  Lori Lamel,et al.  The translanguage English database (TED) , 1994, ICSLP.

[6]  S. Vogel,et al.  SMT decoder dissected: word reordering , 2003, International Conference on Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003.

[7]  Kornel Laskowski,et al.  Advances in lecture recognition: the ISL RT-06s evaluation system , 2006, INTERSPEECH.

[8]  Andreas Stolcke,et al.  Finding consensus among words: lattice-based word error minimization , 1999, EUROSPEECH.

[9]  Sebastian Stüker,et al.  Open Domain Speech Recognition & Translation:Lectures and Speeches , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[10]  Mark J. F. Gales Semi-tied covariance matrices , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[11]  Tanja Schultz,et al.  LingWear: A Mobile Tourist Information System , 2001, HLT.

[12]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[13]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[14]  T Schaaf,et al.  Technology and Corpora for Speech to Speech Translation Title: Asr Progress Report , 2005 .

[15]  Fabio Brugnara,et al.  Advances in the automatic transcription of lectures , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  S. Vogel PESA: Phrase Pair Extraction as Sentence Splitting , 2005, MTSUMMIT.

[17]  Andreas Stolcke,et al.  The ICSI Meeting Project: Resources and Research , 2004 .

[18]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[19]  Hermann Ney,et al.  HMM-Based Word Alignment in Statistical Translation , 1996, COLING.

[20]  Klaus Linhard,et al.  Steerable highly directional audio beam loudspeaker , 2005, INTERSPEECH.

[21]  Alexander H. Waibel CHIL - Computers in the Human Interaction Loop , 2005, MVA.