Learning to translate from graded and negative relevance information

We present an approach for learning to translate by exploiting cross-lingual link structure in multilingual document collections. We propose a new learning objective based on structured ramp loss, which learns from graded relevance, explicitly including negative relevance information. Our results on English German translation of Wikipedia entries show small, but significant, improvements of our method over an unadapted baseline, even when only a weak relevance signal is used. We also compare our method to monolingual language model adaptation and automatic pseudo-parallel data extraction and find small improvements even over these strong baselines.

[1]  Philipp Koehn,et al.  Scalable Modified Kneser-Ney Language Model Estimation , 2013, ACL.

[2]  Christopher D. Manning,et al.  Bilingual Word Representations with Monolingual Quality in Mind , 2015, VS@HLT-NAACL.

[3]  Philipp Koehn,et al.  Experiments in Domain Adaptation for Statistical Machine Translation , 2007, WMT@ACL.

[4]  David A. McAllester,et al.  Generalization bounds and consistency for latent-structural probit and ramp loss , 2011, MLSLP.

[5]  Stefan Riezler,et al.  Learning Translational and Knowledge-based Similarities from Relevance Rankings for Cross-Language Retrieval , 2014, ACL.

[6]  Alon Lavie,et al.  Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability , 2011, ACL.

[7]  Vladimir Eidelman,et al.  cdec: A Decoder, Alignment, and Learning Framework for Finite- State and Context-Free Translation Models , 2010, ACL.

[8]  Chris Dyer,et al.  Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT , 2012, ACL.

[9]  Roland Kuhn,et al.  Mixture-Model Adaptation for SMT , 2007, WMT@ACL.

[10]  Paul G. Spirakis,et al.  Weighted random sampling with a reservoir , 2006, Inf. Process. Lett..

[11]  Dragos Stefan Munteanu,et al.  Improving Machine Translation Performance by Exploiting Non-Parallel Corpora , 2005, CL.

[12]  Noah A. Smith,et al.  Structured Ramp Loss Minimization for Machine Translation , 2012, HLT-NAACL.

[13]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[14]  Andreas Eisele,et al.  MultiUN: A Multilingual Corpus from United Nation Documents , 2010, LREC.

[15]  David Chiang,et al.  Hope and Fear for Discriminative Training of Statistical Translation Models , 2012, J. Mach. Learn. Res..

[16]  Krzysztof Marasek,et al.  Unsupervised comparable corpora preparation and exploration for bi-lingual translation equivalents , 2015, ArXiv.

[17]  Philipp Koehn,et al.  Dirt Cheap Web-Scale Parallel Text from the Common Crawl , 2013, ACL.

[18]  Adam Lopez,et al.  Hierarchical Phrase-Based Translation with Suffix Arrays , 2007, EMNLP.