论文信息 - Breaking the Resource Bottleneck for Multilingual Parsing

Breaking the Resource Bottleneck for Multilingual Parsing

Abstract : We propose a framework that enables the acquisition of annotation-heavy resources such as syntactic dependency tree corpora for low-resource languages by importing linguistic annotations from high-quality English resources. We present a large-scale experiment showing that Chinese dependency trees can be induced by using an English parser, a word alignment package, and a large corpus of sentence-aligned bilingual text. As a part of the experiment, we evaluate the quality of a Chinese parser trained on the induced dependency treebank. We find that a parser trained in this manner out-performs some simple baselines inspite of the noise in the induced treebank. The results suggest that projecting syntactic structures from English is a viable option for acquiring annotated syntactic structures quickly and cheaply. We expect the quality of the induced treebank to improve when more sophisticated filtering and error-correction techniques are applied.

Philip Resnik | Rebecca Hwa | Amy Weinberg

[1] Dekang Lin,et al. A dependency-based method for evaluating broad-coverage parsers , 1995, Natural Language Engineering.

[2] Fei Xia,et al. Converting Dependency Structures to Phrase Structures , 2001, HLT.

[3] Philip Resnik,et al. Evaluating Translational Correspondence using Annotation Projection , 2002, ACL.

[4] Mark C. Baker,et al. Thematic Roles and Syntactic Structure , 1997 .

[5] Ted Briscoe,et al. Corpus Annotation for Parser Evaluation , 1999, ArXiv.

[6] David Chiang,et al. Two Statistical Parsing Models Applied to the Chinese Treebank , 2000, ACL 2000.

[7] Xiaoyi Ma,et al. BITS: a method for bilingual text search over the Web , 1999, MTSUMMIT.

[8] David Yarowsky,et al. Statistical Machine Translation: Final Report , 1999 .

[9] David Yarowsky,et al. Inducing Multilingual POS Taggers and NP Bracketers via Robust Projection Across Aligned Corpora , 2001, NAACL.

[10] Eugene Charniak,et al. A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[11] Bonnie J. Dorr,et al. Machine Translation: A View from the Lexicon , 1994, CL.