论文信息 - Improved Word-Level Alignment: Injecting Knowledge about MT Divergences

Improved Word-Level Alignment: Injecting Knowledge about MT Divergences

Abstract : Word-level alignments of bilingual text (bitexts) are not an integral part of statistical machine translation models, but also useful for lexical acquisition, treebank construction. and part-of-speech tagging. The frequent occurrence of divergences, structural differences between languages, presents a great challenge to the alignment task. We resolve some of the most prevalent divergence cases by using syntactic parse information to transform the sentence structure of one language to bear a closer resemblance to that of the other language. In this paper, we show that common divergence types can be found in multiple language pairs (in particular, we focus on English-Spanish and English-Arabic) and systematically identified. We describe our techniques for modifying English parse trees to form resulting sentences that share more similarity with the sentences in the other languages; finally, we present an empirical analysis comparing the complexities of performing word-level alignments with an without divergence handling. Our results suggest that divergence-handling can improve word-level alignment.

[1] Rémi Zajac,et al. Rapid Development of Translation Tools , 1999 .

[2] Dekai Wu,et al. Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[3] Rebecca Hwa,et al. Sample Selection for Statistical Grammar Induction , 2000, EMNLP.

[4] I. Dan Melamed. Empirical Methods for MT Lexicon Development , 1998, AMTA.

[5] Philip Resnik,et al. Evaluating Translational Correspondence using Annotation Projection , 2002, ACL.

[6] Raymond J. Mooney,et al. Learning Parse and Translation Decisions from Examples with Rich Context , 1997, ACL.

[7] Kevin Knight,et al. A Syntax-based Statistical Translation Model , 2001, ACL.

[8] Bonnie J. Dorr,et al. Machine Translation: A View from the Lexicon , 1994, CL.

[9] Sergei Nirenburg,et al. A Statistical Approach to Machine Translation , 2003 .

[10] Esther Duflo,et al. Empirical Methods , 2019, Research Handbook on the Economics of Intellectual Property Law.

[11] EstimationPeter,et al. The Mathematics of Machine Translation : Parameter , 2004 .

[12] Srinivas Bangalore,et al. Learning Dependency Translation Models as Collections of Finite-State Head Transducers , 2000, Computational Linguistics.

[13] David Yarowsky,et al. Inducing Multilingual POS Taggers and NP Bracketers via Robust Projection Across Aligned Corpora , 2001, NAACL.

[14] Hiyan Alshawi,et al. Learning dependency transduction models from unannotated examples , 2000, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[15] Rebecca Hwa,et al. Sample Selection for Statistical Parsing , 2004, CL.

[16] Dekang Lin,et al. Dependency-Based Evaluation of Minipar , 2003 .