The TALP&I2r SMT systems for IWSLT 2008

This paper gives a description of the statistical machine translation (SMT) systems developed at the TALP Research Center of the UPC (Universitat Politde Catalunya) for our participation in the IWSLT'08 evaluation campaign. We present N gram-based (TALPtuples) and phrase-based (TALPphrases) SMT systems. The paper explains the 2008 systems' architecture and outlines translation schemes we have used, mainly focusing on the new techniques that are challenged to improve speech-to-speech translation quality. The novelties we have introduced are: improved reorder- ing method, linear combination of translation and reordering models and new technique dealing with punctuation marks insertion for a phrase-based SMT system. This year we focus on the Arabic-English, Chinese- Spanish and pivot Chinese-(English)-Spanish translation tasks.

[1]  Nizar Habash Syntactic preprocessing for statistical machine translation , 2007, MTSUMMIT.

[2]  José B. Mariño,et al.  Finite-state-based and phrase-based statistical machine translation , 2004, INTERSPEECH.

[3]  Qun Liu,et al.  HHMM-based Chinese Lexical Analyzer ICTCLAS , 2003, SIGHAN.

[4]  José B. Mariño,et al.  An n-gram-based statistical machine translation decoder , 2005, INTERSPEECH.

[5]  Nizar Habash,et al.  Permission is granted to quote short excerpts and to reproduce figures and tables from this report, provided that the source of such material is fully acknowledged. Arabic Preprocessing Schemes for Statistical Machine Translation , 2006 .

[6]  Marta R. Costa-jussà,et al.  Statistical Machine Reordering , 2006, EMNLP.

[7]  Michael Paul,et al.  Overview of the IWSLT06 evaluation campaign , 2006, IWSLT.

[8]  Holger Schwenk,et al.  Data selection and smoothing in an open-source system for the 2008 NIST machine translation evaluation , 2008, INTERSPEECH.

[9]  Xavier Carreras,et al.  FreeLing: An Open-Source Suite of Language Analyzers , 2004, LREC.

[10]  Cameron S. Fordyce,et al.  Overview of the IWSLT 2007 evaluation campaign , 2007, IWSLT.

[11]  Marta R. Costa-jussà,et al.  Analysis of Statistical and Morphological Classes to Generate Weigthed Reordering Hypotheses on a Statistical Machine Translation System , 2007, WMT@ACL.

[12]  Rafael E. Banchs,et al.  UPC's Bilingual N-gram Translation System , 2006 .

[13]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[14]  Philipp Koehn,et al.  Explorer Edinburgh System Description for the 2005 IWSLT Speech Translation Evaluation , 2005 .

[15]  Eiichiro Sumita,et al.  Toward a Broad-coverage Bilingual Corpus for Speech Translation of Travel Conversations in the Real World , 2002, LREC.

[16]  Hua Wu,et al.  Pivot language approach for phrase-based statistical machine translation , 2007, ACL.

[17]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[18]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[19]  Shankar Kumar,et al.  Minimum Bayes-Risk Decoding for Statistical Machine Translation , 2004, NAACL.