Topic adaptation for Statistical Machine Translation

we present new ways for Farsi to English topic adaptation for statistical machine translation. We incorporate topic in the phrase table in the form of sparse phrasal features and make use of sparse lexical features by determining the topic distribution of source sentences in the development and test corpus. These sparse features cover a lot of source to target topic related translations. We also develop systems with features that measure the topical similarity of the source sentence and each hypothesis. These features include features based on distributional profiles and two types of features which make use of bilingual topic models to measure the similarity of the source sentence and the hypothesis using topic vectors in source and target languages. Domain and topic adaptation is also combined to improve the translation quality. Different experiments are carried out on Farsi to English Verbmobil and CNN datasets. BLEU score shows up to 2.0 improvement on Verbmobil dataset. Up to 1.17 BLEU improvement and several individual translation corrections are observed in CNN dataset.

[1]  Marta R. Costa-jussà,et al.  A Semantic Feature for Statistical Machine Translation , 2011, SSST@ACL.

[2]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[3]  Roland Kuhn,et al.  Vector Space Model for Adaptation in Statistical Machine Translation , 2013, ACL.

[4]  Marta R. Costa-jussà,et al.  A vector-space dynamic feature for phrase-based statistical machine translation , 2010, Journal of Intelligent Information Systems.

[5]  Philipp Koehn,et al.  Sparse lexicalised features and topic adaptation for SMT , 2012, IWSLT.

[6]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[7]  Qun Liu,et al.  Translation Model Adaptation for Statistical Machine Translation with Monolingual Topic Information , 2012, ACL.

[8]  Philipp Koehn,et al.  Margin Infused Relaxed Algorithm for Moses , 2011, Prague Bull. Math. Linguistics.

[9]  Philipp Koehn,et al.  Dynamic Topic Adaptation for SMT using Distributional Profiles , 2014, WMT@ACL.

[10]  Chris Callison-Burch,et al.  Open Source Toolkit for Statistical Machine Translation: Factored Translation Models and Lattice Decoding , 2006 .

[11]  Khalil Sima'an,et al.  UvA-DARE ( Digital Academic Repository ) Latent Domain Translation Models in Mix-of-Domains Haystack , 2014 .

[12]  Vladimir Eidelman,et al.  Topic Models for Dynamic Translation Model Adaptation , 2012, ACL.

[13]  Mirella Lapata,et al.  Measuring Distributional Similarity in Context , 2010, EMNLP.