FBK@IWSLT 2007

This paper reports on the participation of FBK (formerly ITC-irst) at the IWSLT 2007 Evaluation. FBK participated in three tasks, namely Chinese-to-English, Japaneseto-English, and Italian-to-English. With respect to last year, translation systems were developed with the Moses Toolkit and theIRSTLM library, both available as open source software. Moreover, several novel ideas were investigated: the use of confusion networks in input to manage ambiguity in punctuation, the estimation of an additional language model by means of the Google’s Web 1T 5-gram collection, the combination of true case and lower case language models, and finally the use of multiple phrase-tables. By working on top of a state-of-the art baseline, experiments showed that the above methods accounted for significant BLEU score improvements.

[1]  Richard Zens,et al.  Speech Translation by Confusion Network Decoding , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[2]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[3]  Marcello Federico,et al.  Improving Phrase-Based Statistical Translation Through Combination of Word Alignments , 2006, FinTAL.

[4]  Hermann Ney,et al.  N-Gram Posterior Probabilities for Statistical Machine Translation , 2006, WMT@HLT-NAACL.

[5]  Philipp Koehn,et al.  Explorer Edinburgh System Description for the 2005 IWSLT Speech Translation Evaluation , 2005 .

[6]  Andreas Stolcke,et al.  Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..

[7]  Marcello Federico,et al.  How Many Bits Are Needed To Store Probabilities for Phrase-Based Translation? , 2006, WMT@HLT-NAACL.

[8]  Richard Zens,et al.  The JHU workshop 2006 IWSLT system , 2006, IWSLT.

[9]  Mauro Cettolo,et al.  Efficient Handling of N-gram Language Models for Statistical Machine Translation , 2007, WMT@ACL.

[10]  Marcello Federico,et al.  Punctuating confusion networks for speech translation , 2007, INTERSPEECH.

[11]  Mauro Cettolo,et al.  The ITC-irst SMT system for IWSLT 2006 , 2006, IWSLT.

[12]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[13]  Fabio Brugnara,et al.  The IRST English-Spanish translation system for european parliament speeches , 2007, INTERSPEECH.

[14]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[15]  Hitoshi Isahara,et al.  Reliable Measures for Aligning Japanese-English News Articles and Sentences , 2003, ACL.

[16]  Claudia Soria,et al.  ADAM: The SI-TAL Corpus of Annotated Dialogues , 2002, LREC.