论文信息 - Learning Chinese Bracketing Knowledge Based on a Bilingual Language Model

Learning Chinese Bracketing Knowledge Based on a Bilingual Language Model

This paper proposes a new method for automatic acquisition of Chinese bracketing knowledge from English-Chinese sentence-aligned bilingual corpora. Bilingual sentence pairs are first aligned in syntactic structure by combining English parse trees with a statistical bilingual language model. Chinese bracketing knowledge is then extracted automatically. The preliminary experiments show automatically learned knowledge accords well with manually annotated brackets. The proposed method is particularly useful to acquire bracketing knowledge for a less studied language that lacks tools and resources found in a second language more studied. Although this paper discusses experiments with Chinese and English, the method is also applicable to other language pairs.

[1] Dekai Wu,et al. An Algorithm for Simultaneously Bracketing Parallel Texts by Aligning Words , 1995, ACL.

[2] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[3] I. Dan Melamed,et al. Models of translation equivalence among words , 2000, CL.

[4] Keh-Jiann Chen,et al. A Model for Robust Chinese Parser , 1996, Int. J. Comput. Linguistics Chin. Lang. Process..

[5] Tiejun Zhao,et al. Automatic Translation Template Acquisition Based on Bilingual Structure Alignment , 2001, Int. J. Comput. Linguistics Chin. Lang. Process..

[6] Michael Collins,et al. Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[7] Dekai Wu,et al. Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[8] Eric Brill,et al. Transformation-Based Error-Driven Parsing , 1993, IWPT.