Treebank Conversion based Self-training Strategy for Parsing

In this paper, we propose a novel selftraining strategy for parsing which is based on Treebank conversion (SSPTC). In SSPTC, we make full use of the strong points of Treebank conversion and self-training, and offset their weaknesses with each other. To provide good parse selection strategies which are needed in self-training, we score the automatically generated parse trees with parse trees in source Treebank as a reference. To maintain the constituency between source Treebank and conversion Treebank which is needed in Treebank conversion, we get the conversion trees with the help of self-training. In our experiments, SSPTC strategy is utilized to parse Tsinghua Chinese Treebank with the help of Penn Chinese Treebank. The results significantly outperform the baseline parser.

[1]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[2]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[3]  Zhou Qiang Annotation Scheme for Chinese Treebank , 2004 .

[4]  Changning Huang,et al.  Better Parser Combination , 2009 .

[5]  Fei Xia,et al.  Converting Dependency Structures to Phrase Structures , 2001, HLT.

[6]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[7]  Anoop Sarkar,et al.  Corrected Co-training for Statistical Parsers , 2003 .

[8]  Fei Xia Towards a Multi-Representational Treebank , 2008 .

[9]  Eugene Charniak,et al.  Effective Self-Training for Parsing , 2006, NAACL.

[10]  Jun'ichi Tsujii,et al.  Probabilistic CFG with Latent Annotations , 2005, ACL.

[11]  Zheng-Yu Niu,et al.  Exploiting Heterogeneous Treebanks for Parsing , 2009, ACL/IJCNLP.

[12]  Mary P. Harper,et al.  Self-Training PCFG Grammars with Latent Annotations Across Languages , 2009, EMNLP.

[13]  James Henderson,et al.  Discriminative Training of a Neural Network Statistical Parser , 2004, ACL.

[14]  Keh-Yih Su,et al.  An Automatic Treebank Conversion Algorithm for Corpus Sharing , 1994, ACL.

[15]  Michael Collins,et al.  A Statistical Parser for Czech , 1999, ACL.

[16]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[17]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.