论文信息 - Design of a multi-lingual, parallel-processing statistical parsing engine

Design of a multi-lingual, parallel-processing statistical parsing engine

Ever since the widespread availability of the Penn Treebank [9], there have been numerous, statistical parsers developed for English, e.g. [8, 5, 3]. To varying degrees, these parsers and others---while very successful at the tasks for which they were designed---had the following limitations: • they had a fairly fixed probabilistic structure, which could only be changed by re-coding some significant portion of the program • they had hard-coded features specific to English • they had hard-coded features specific to the Penn Treebank • they were designed only for a uniprocessor environment

Daniel M. Bikel | D. Bikel

[1] Fei Xia,et al. Comparing Lexicalized Treebank Grammars Extracted from Chinese, Korean, and English Corpora , 2000, ACL 2000.

[2] David M. Magerman. Statistical Decision-Tree Models for Parsing , 1995, ACL.

[3] Richard M. Schwartz,et al. Coping with Ambiguity and Unknown Words through Probabilistic Models , 1993, CL.

[4] Michael Collins,et al. Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[5] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[6] Nianwen Xue,et al. Developing Guidelines and Ensuring Consistency for Chinese Text Annotation , 2000, LREC.

[7] Eugene Charniak,et al. A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[8] David Chiang,et al. Statistical Parsing with an Automatically-Extracted Tree Adjoining Grammar , 2000, ACL.

[9] Daniel M. Bikel. A Statistical Model for Parsing and Word-Sense Disambiguation , 2000, EMNLP.

[10] David Chiang,et al. Two Statistical Parsing Models Applied to the Chinese Treebank , 2000, ACL 2000.

[11] Michael Collins,et al. Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.