论文信息 - Self-Training PCFG Grammars with Latent Annotations Across Languages - 字舞流文

Self-Training PCFG Grammars with Latent Annotations Across Languages

We investigate the effectiveness of self-training PCFG grammars with latent annotations (PCFG-LA) for parsing languages with different amounts of labeled training data. Compared to Charniak's lexicalized parser, the PCFG-LA parser was more effectively adapted to a language for which parsing has been less well developed (i.e., Chinese) and benefited more from self-training. We show for the first time that self-training is able to significantly improve the performance of the PCFG-LA parser, a single generative parser, on both small and large amounts of labeled training data. Our approach achieves state-of-the-art parsing accuracies for a single parser on both English (91.5%) and Chinese (85.2%).

Mary P. Harper | Zhongqiang Huang | M. Harper | Zhongqiang Huang

[1] Eugene Charniak,et al. Statistical Parsing with a Context-Free Grammar and Word Statistics , 1997, AAAI/IAAI.

[2] Eugene Charniak,et al. A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[3] David Chiang,et al. Two Statistical Parsing Models Applied to the Chinese Treebank , 2000, ACL 2000.

[4] Roger Levy,et al. Is it Harder to Parse Chinese, or the Chinese Treebank? , 2003, ACL.

[5] Michael Collins,et al. Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[6] Mark Steedman,et al. Bootstrapping statistical parsers from small datasets , 2003, EACL.

[7] Eugene Charniak,et al. Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[8] M. A. R T A P A L,et al. The Penn Chinese TreeBank: Phrase structure annotation of a large corpus , 2005, Natural Language Engineering.

[9] Jun'ichi Tsujii,et al. Probabilistic CFG with Latent Annotations , 2005, ACL.

[10] Roger Levy,et al. Tregex and Tsurgeon: tools for querying and manipulating tree data structures , 2006, LREC.

[11] Eugene Charniak,et al. Effective Self-Training for Parsing , 2006, NAACL.

[12] Dan Klein,et al. Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[13] Dan Klein,et al. Improved Inference for Unlexicalized Parsing , 2007, NAACL.

[14] Ari Rappoport,et al. Self-Training for Enhancement and Domain Adaptation of Statistical Parsers Trained on Small Datasets , 2007, ACL.

[15] Wen Wang,et al. Mandarin Part-of-Speech Tagging and Discriminative Reranking , 2007, EMNLP.

[16] Eugene Charniak,et al. When is Self-Training Effective for Parsing? , 2008, COLING.

[17] Christopher D. Manning,et al. Optimizing Chinese Word Segmentation for Machine Translation Performance , 2008, WMT@ACL.

[18] Dan Klein,et al. Sparse Multi-Scale Grammars for Discriminative Latent Variable Parsing , 2008, EMNLP.

[19] Xavier Carreras,et al. Simple Semi-supervised Dependency Parsing , 2008, ACL.

[20] Dale Schuurmans,et al. Semi-Supervised Convex Training for Dependency Parsing , 2008, ACL.

[21] Liang Huang,et al. Forest Reranking: Discriminative Parsing with Non-Local Features , 2008, ACL.

[22] M. Harper,et al. Chinese Statistical Parsing , 2009 .