论文信息 - Capturing translational divergences with a statistical tree-to-tree aligner - 字舞流文

Capturing translational divergences with a statistical tree-to-tree aligner

Parallel treebanks, which comprise paired source-target parse trees aligned at sub-sentential level, could be useful for many applications, particularly data-driven machine translation. In this paper, we focus on how translational divergences are captured within a parallel treebank using a fully automatic statistical tree-to-tree aligner. We observe that while the algorithm performs well at the phrase level, performance on lexical-level alignments is compromised by an inappropriate bias towards coverage rather than precision. This preference for high precision rather than broad coverage in terms of expressing translational divergences through tree-alignment stands in direct opposition to the situation for SMT word-alignment models. We suggest that this has implications not only for tree-alignment itself but also for the broader area of induction of syntaxaware models for SMT.

Andy Way | Mary Hearne | John Tinsley | Ventsislav Zhechev

[1] David Chiang,et al. A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[2] Daniel Marcu,et al. Statistical Phrase-Based Translation , 2003, NAACL.

[3] H. Altay Güvenir,et al. Learning Translation Templates from Bilingual Translation Examples , 2004, Applied Intelligence.

[4] Daniel Marcu,et al. SPMT: Statistical Machine Translation with Syntactified Target Language Phrases , 2006, EMNLP.

[5] Ben Taskar,et al. Alignment by Agreement , 2006, NAACL.

[6] Philipp Koehn,et al. Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[7] Hermann Ney,et al. A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[8] Andy Way,et al. Robust language pair-independent sub-tree alignment , 2007, MTSUMMIT.

[9] M. Volk,et al. Bootstrapping Parallel Treebanks , 2004, COLING 2004.

[10] Dekai Wu,et al. Bracketing and aligning words and constituents in parallel text using Stochastic Inversion Transduction Grammars , 2000 .

[11] Arturo Trujillo. Translation Engines: Techniques for Machine Translation , 1999 .

[12] I. Dan Melamed. Annotation Style Guide for the Blinker Project , 1998, ArXiv.

[13] Daniel Marcu,et al. Scalable Inference and Training of Context-Rich Syntactic Translation Models , 2006, ACL.

[14] Cyril Goutte. Automatic Evaluation of Machine Translation Quality , 2006 .

[15] Martin Volk,et al. Phrase Alignment in Parallel Treebanks , 2006 .

[16] Daniel M. Bikel,et al. Design of a multi-lingual, parallel-processing statistical parsing engine , 2002 .

[17] Bonnie J. Dorr,et al. Machine Translation Divergences: A Formal Description and Proposed Solution , 1994, CL.

[18] I. Dan Melamed,et al. Statistical Machine Translation by Parsing , 2004, ACL.

[19] Yanjun Ma,et al. Bootstrapping Word Alignment via Word Packing , 2007, ACL.

[20] Harold L. Somers,et al. An introduction to machine translation , 1992 .

[21] Nizar Habash,et al. DUSTer: a method for unraveling cross-language divergences for statistical word-level alignment , 2002, AMTA.

[22] I. Dan Melamed,et al. Empirical Lower Bounds on the Complexity of Translational Equivalence , 2006, ACL.

[23] Eugene Charniak,et al. A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[24] Andy Way,et al. Disambiguation Strategies for Data-Oriented Translation , 2006, EAMT.

[25] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[26] Andreas Zollmann,et al. Syntax Augmented Machine Translation via Chart Parsing , 2006, WMT@HLT-NAACL.

[27] Jörg Tiedemann. Word to word alignment strategies , 2004, COLING.