论文信息 - Treebank Conversion based Self-training Strategy for Parsing

Treebank Conversion based Self-training Strategy for Parsing

In this paper, we propose a novel selftraining strategy for parsing which is based on Treebank conversion (SSPTC). In SSPTC, we make full use of the strong points of Treebank conversion and self-training, and offset their weaknesses with each other. To provide good parse selection strategies which are needed in self-training, we score the automatically generated parse trees with parse trees in source Treebank as a reference. To maintain the constituency between source Treebank and conversion Treebank which is needed in Treebank conversion, we get the conversion trees with the help of self-training. In our experiments, SSPTC strategy is utilized to parse Tsinghua Chinese Treebank with the help of Penn Chinese Treebank. The results significantly outperform the baseline parser.

Chengqing Zong | Zhiguo Wang

[1] Dan Klein,et al. Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[2] Michael Collins,et al. Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[3] Zhou Qiang. Annotation Scheme for Chinese Treebank , 2004 .

[4] Changning Huang,et al. Better Parser Combination , 2009 .

[5] Fei Xia,et al. Converting Dependency Structures to Phrase Structures , 2001, HLT.

[6] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[7] Anoop Sarkar,et al. Corrected Co-training for Statistical Parsers , 2003 .

[8] Fei Xia. Towards a Multi-Representational Treebank , 2008 .

[9] Eugene Charniak,et al. Effective Self-Training for Parsing , 2006, NAACL.

[10] Jun'ichi Tsujii,et al. Probabilistic CFG with Latent Annotations , 2005, ACL.

[11] Zheng-Yu Niu,et al. Exploiting Heterogeneous Treebanks for Parsing , 2009, ACL/IJCNLP.

[12] Mary P. Harper,et al. Self-Training PCFG Grammars with Latent Annotations Across Languages , 2009, EMNLP.

[13] James Henderson,et al. Discriminative Training of a Neural Network Statistical Parser , 2004, ACL.

[14] Keh-Yih Su,et al. An Automatic Treebank Conversion Algorithm for Corpus Sharing , 1994, ACL.

[15] Michael Collins,et al. A Statistical Parser for Czech , 1999, ACL.

[16] Eugene Charniak,et al. Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[17] Eugene Charniak,et al. A Maximum-Entropy-Inspired Parser , 2000, ANLP.