Word-level Alignment for Multilingual Resource Acquisition

Abstract : We present a simple, one-pass word alignment algorithm for parallel text. Our algorithm utilizes synchronous parsing and takes advantage of existing syntactic annotations. In our experiments the performance of this model is comparable to more complicated iterative methods. We discuss the challenges and potential benefits of using the model to train syntactic parsers for new languages.

[1]  Philip Resnik,et al.  Spanish Language Processing at University of Maryland: Building Infrastructure for Multilingual Applications , 2001 .

[2]  Kenneth Ward Church,et al.  Identifying word correspondence in parallel texts , 1991 .

[3]  Stuart M. Shieber,et al.  Synchronous Tree-Adjoining Grammars , 1990, COLING.

[4]  Kenneth Ward Church,et al.  Identifying Word Correspondences in Parallel Texts , 1991, HLT.

[5]  Srinivas Bangalore,et al.  Learning Dependency Translation Models as Collections of Finite-State Head Transducers , 2000, Computational Linguistics.

[6]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars, with Application to Segmentation, Bracketing, and Alignment of Parallel Corpora , 1995, IJCAI.

[7]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[8]  J. Gross,et al.  Graph Theory and Its Applications , 1998 .

[9]  Daniel Dominic Sleator,et al.  Parsing English with a Link Grammar , 1995, IWPT.

[10]  I. Dan Melamed Annotation Style Guide for the Blinker Project , 1998, ArXiv.

[11]  I. Dan Melamed,et al.  Models of translation equivalence among words , 2000, CL.

[12]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[13]  Philip Resnik,et al.  Breaking the Resource Bottleneck for Multilingual Parsing , 2002 .

[14]  Adwait Ratnaparkhi,et al.  Learning to Parse Natural Language with Maximum Entropy Models , 1999, Machine Learning.

[15]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[16]  Hermann Ney,et al.  Improved Statistical Alignment Models , 2000, ACL.

[17]  Hermann Ney,et al.  Improved Alignment Models for Statistical Machine Translation , 1999, EMNLP.

[18]  Kevin Knight,et al.  A Syntax-based Statistical Translation Model , 2001, ACL.

[19]  David Yarowsky,et al.  Statistical Machine Translation: Final Report , 1999 .

[20]  David Yarowsky,et al.  Inducing Multilingual POS Taggers and NP Bracketers via Robust Projection Across Aligned Corpora , 2001, NAACL.

[21]  Srinivas Bangalore,et al.  Head-Transducer Models for Speech Translation and Their Automatic Acquisition from Bilingual Data , 2004, Machine Translation.

[22]  Nianwen Xue,et al.  Developing Guidelines and Ensuring Consistency for Chinese Text Annotation , 2000, LREC.

[23]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[24]  Ted Pedersen,et al.  A Decision Tree of Bigrams is an Accurate Predictor of Word Sense , 2001, NAACL.

[25]  Hiyan Alshawi,et al.  Learning dependency transduction models from unannotated examples , 2000, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[26]  Philip Resnik,et al.  Evaluating Translational Correspondence using Annotation Projection , 2002, ACL.

[27]  Sergei Nirenburg,et al.  A Statistical Approach to Machine Translation , 2003 .