The UvA system description for IWSLT 2010

We describe the machine translation system of the University of Amsterdam, that was used to decode the Chinese→English test sets of the DIALOG task. It consists of typical phrase-based translation, SRILM 5-gram language, lexicalized and distance-based distortion and word penalty models which are manipulated according to a model adaption technique, based on the identification of subdomains of the provided data sets.

[1]  Inderjit S. Dhillon,et al.  A Divisive Information-Theoretic Feature Clustering Algorithm for Text Classification , 2003, J. Mach. Learn. Res..

[2]  Marcello Federico,et al.  Domain Adaptation for Statistical Machine Translation with Monolingual Resources , 2009, WMT@EACL.

[3]  Richard M. Schwartz,et al.  Language and Translation Model Adaptation using Comparable Corpora , 2008, EMNLP.

[4]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[5]  Philipp Koehn,et al.  Large and Diverse Language Models for Statistical Machine Translation , 2008, IJCNLP.

[6]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[7]  Alex Waibel,et al.  Adaptation of the translation model for statistical machine translation based on information retrieval , 2005, EAMT.

[8]  Roland Kuhn,et al.  Mixture-Model Adaptation for SMT , 2007, WMT@ACL.

[9]  Philipp Koehn,et al.  Experiments in Domain Adaptation for Statistical Machine Translation , 2007, WMT@ACL.

[10]  Gholamreza Haffari,et al.  Transductive learning for statistical machine translation , 2007, ACL.

[11]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[12]  Naftali Tishby,et al.  Document clustering using word clusters via the information bottleneck method , 2000, SIGIR '00.

[13]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[14]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.