Evaluating Translational Correspondence using Annotation Projection

Recently, statistical machine translation models have begun to take advantage of higher level linguistic structures such as syntactic dependencies. Underlying these models is an assumption about the directness of translational correspondence between sentences in the two languages; however, the extent to which this assumption is valid and useful is not well understood. In this paper, we present an empirical study that quantifies the degree to which syntactic dependencies are preserved when parses are projected directly from English to Chinese. Our results show that although the direct correspondence assumption is often too restrictive, a small set of principled, elementary linguistic transformations can boost the quality of the projected Chinese parses by 76% relative to the unimproved baseline.

[1]  Noam Chomsky,et al.  Lectures on Government and Binding , 1981 .

[2]  Anne Abeillé,et al.  A Lexicalized Tree Adjoining Grammar for English , 1990 .

[3]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[4]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[5]  Bonnie J. Dorr,et al.  Machine Translation Divergences: A Formal Description and Proposed Solution , 1994, CL.

[6]  S. Shieber RESTRICTING THE WEAK‐GENERATIVE CAPACITY OF SYNCHRONOUS TREE‐ADJOINING GRAMMARS , 1994, Comput. Intell..

[7]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars, with Application to Segmentation, Bracketing, and Alignment of Parallel Corpora , 1995, IJCAI.

[8]  Jason Eisner Bilexical Grammars and a Cubic-time Probabilistic Parser , 1997, IWPT.

[9]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[10]  I. Dan Melamed Annotation Style Guide for the Blinker Project , 1998, ArXiv.

[11]  Frederick Jelinek,et al.  Exploiting Syntactic Structure for Language Modeling , 1998, ACL.

[12]  Handling structural divergences and recovering dropped arguments in a Korean/English machine translation system , 2000, AMTA.

[13]  Nianwen Xue,et al.  Developing Guidelines and Ensuring Consistency for Chinese Text Annotation , 2000, LREC.

[14]  Owen Rambow,et al.  Handling Stuctural Divergences and Recovering Dropped Arguments in a Korean/English Machine Translation System , 2000, AMTA.

[15]  Srinivas Bangalore,et al.  Learning Dependency Translation Models as Collections of Finite-State Head Transducers , 2000, Computational Linguistics.

[16]  Rebecca Hwa,et al.  Sample Selection for Statistical Grammar Induction , 2000, EMNLP.

[17]  Kevin Knight,et al.  A Syntax-based Statistical Translation Model , 2001, ACL.

[18]  David Yarowsky,et al.  Inducing Multilingual POS Taggers and NP Bracketers via Robust Projection Across Aligned Corpora , 2001, NAACL.

[19]  Providen e RIe Immediate-Head Parsing for Language Models , 2001 .

[20]  Philip Resnik,et al.  Spanish Language Processing at University of Maryland: Building Infrastructure for Multilingual Applications , 2001 .

[21]  Michael White,et al.  Inducing Lexico-Structural Transfer Rules from Parsed Bi-texts , 2001, DDMMT@ACL.

[22]  Arul Menezes,et al.  A best-first alignment algorithm for automatic extraction of transfer mappings from bilingual corpora , 2001, DDMMT@ACL.

[23]  Fei Xia,et al.  Converting Dependency Structures to Phrase Structures , 2001, HLT.

[24]  Anoop Sarkar,et al.  Applying Co-Training Methods to Statistical Parsing , 2001, NAACL.

[25]  Philip Resnik,et al.  Word-level Alignment for Multilingual Resource Acquisition , 2002 .

[26]  Dekang Lin,et al.  Dependency-Based Evaluation of Minipar , 2003 .

[27]  EstimationPeter,et al.  The Mathematics of Machine Translation : Parameter , 2004 .

[28]  C. J. van Rijsbergen,et al.  Information Retrieval , 1979, Encyclopedia of GIS.