Improved Word-Level Alignment: Injecting Knowledge about MT Divergences

Abstract : Word-level alignments of bilingual text (bitexts) are not an integral part of statistical machine translation models, but also useful for lexical acquisition, treebank construction. and part-of-speech tagging. The frequent occurrence of divergences, structural differences between languages, presents a great challenge to the alignment task. We resolve some of the most prevalent divergence cases by using syntactic parse information to transform the sentence structure of one language to bear a closer resemblance to that of the other language. In this paper, we show that common divergence types can be found in multiple language pairs (in particular, we focus on English-Spanish and English-Arabic) and systematically identified. We describe our techniques for modifying English parse trees to form resulting sentences that share more similarity with the sentences in the other languages; finally, we present an empirical analysis comparing the complexities of performing word-level alignments with an without divergence handling. Our results suggest that divergence-handling can improve word-level alignment.

[1]  Rémi Zajac,et al.  Rapid Development of Translation Tools , 1999 .

[2]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[3]  Rebecca Hwa,et al.  Sample Selection for Statistical Grammar Induction , 2000, EMNLP.

[4]  I. Dan Melamed Empirical Methods for MT Lexicon Development , 1998, AMTA.

[5]  Philip Resnik,et al.  Evaluating Translational Correspondence using Annotation Projection , 2002, ACL.

[6]  Raymond J. Mooney,et al.  Learning Parse and Translation Decisions from Examples with Rich Context , 1997, ACL.

[7]  Kevin Knight,et al.  A Syntax-based Statistical Translation Model , 2001, ACL.

[8]  Bonnie J. Dorr,et al.  Machine Translation: A View from the Lexicon , 1994, CL.

[9]  Sergei Nirenburg,et al.  A Statistical Approach to Machine Translation , 2003 .

[10]  Esther Duflo,et al.  Empirical Methods , 2019, Research Handbook on the Economics of Intellectual Property Law.

[11]  EstimationPeter,et al.  The Mathematics of Machine Translation : Parameter , 2004 .

[12]  Srinivas Bangalore,et al.  Learning Dependency Translation Models as Collections of Finite-State Head Transducers , 2000, Computational Linguistics.

[13]  David Yarowsky,et al.  Inducing Multilingual POS Taggers and NP Bracketers via Robust Projection Across Aligned Corpora , 2001, NAACL.

[14]  Hiyan Alshawi,et al.  Learning dependency transduction models from unannotated examples , 2000, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[15]  Rebecca Hwa,et al.  Sample Selection for Statistical Parsing , 2004, CL.

[16]  Dekang Lin,et al.  Dependency-Based Evaluation of Minipar , 2003 .