Lexicons or phrase tables? An investigation in sampling-based multilingual alignment

Sampling-based multilingual alignment is an example-based approach to sub-sentential alignment that has proven to be able to outperform ubiquitous statistical models on some tasks. As for machine translation however, it still typically does not provide as good results as was first expected. In this paper, we propose two experiments to determine the nature of alignments produced by this method, and what they would still lack of. We then deduce what possible improvements will make the method perform better on machine translation tasks.

[1]  Michel Simard,et al.  Multialignement vs bialignement : à plusieurs, c’est mieux ! , 2015, JEPTALNRECITAL.

[2]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[3]  Yves Lepage,et al.  Sampling-based Multilingual Alignment , 2009, RANLP.

[4]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[5]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[6]  James Breen A WWW Japanese Dictionary , 2000 .

[7]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[8]  Cameron S. Fordyce,et al.  Overview of the IWSLT 2007 evaluation campaign , 2007, IWSLT.

[9]  Eiichiro Sumita,et al.  Toward a Broad-coverage Bilingual Corpus for Speech Translation of Travel Conversations in the Real World , 2002, LREC.

[10]  Kenji Araki,et al.  Sub-Sentential Alignment Method by Analogy , 1999, PACLIC.

[11]  Andreas Bode,et al.  Improved Discriminative Bilingual Word Alignment , 2006, ACL.

[12]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[13]  Stephan Vogel,et al.  Parallel Implementations of Word Alignment Tool , 2008, SETQALNLP.

[14]  Ben Taskar,et al.  Alignment by Agreement , 2006, NAACL.

[15]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[16]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[17]  Philipp Koehn,et al.  462 Machine Translation Systems for Europe , 2009, MTSUMMIT.

[18]  François Yvon,et al.  Plusieurs langues (bien choisies) valent mieux qu'une : traduction statistique multi-source par renforcement lexical , 2009 .