Iterative Transformation of Annotation Guidelines for Constituency Parsing

This paper presents an effective algorithm of annotation adaptation for constituency treebanks, which transforms a treebank from one annotation guideline to another with an iterative optimization procedure, thus to build a much larger treebank to train an enhanced parser without increasing model complexity. Experiments show that the transformed Tsinghua Chinese Treebank as additional training data brings significant improvement over the baseline trained on Penn Chinese Treebank only.

[1]  Mary P. Harper,et al.  Self-Training PCFG Grammars with Latent Annotations Across Languages , 2009, EMNLP.

[2]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[3]  Weiwei Sun,et al.  Reducing Approximation and Estimation Errors for Chinese Lexical Processing with Heterogeneous Annotations , 2012, ACL.

[4]  Roger Levy,et al.  Tregex and Tsurgeon: tools for querying and manipulating tree data structures , 2006, LREC.

[5]  Zheng-Yu Niu,et al.  Exploiting Heterogeneous Treebanks for Parsing , 2009, ACL/IJCNLP.

[6]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[7]  Stephen Clark,et al.  A Fast Decoder for Joint Word Segmentation and POS-Tagging Using a Single Discriminative Model , 2010, EMNLP.

[8]  Fei Xia,et al.  Converting Dependency Structures to Phrase Structures , 2001, HLT.

[9]  Weiwei Sun,et al.  Discriminative Parse Reranking for Chinese with Homogeneous and Heterogeneous Annotations , 2010, CIPS-SIGHAN.

[10]  Eugene Charniak,et al.  Effective Self-Training for Parsing , 2006, NAACL.

[11]  Keh-Yih Su,et al.  An Automatic Treebank Conversion Algorithm for Corpus Sharing , 1994, ACL.

[12]  Michael Collins,et al.  Discriminative Reranking for Natural Language Parsing , 2000, CL.

[13]  Jingbo Zhu,et al.  Better Automatic Treebank Conversion Using A Feature-Based Approach , 2011, ACL.

[14]  Wolfgang Menzel,et al.  Automatic Transformation of Phrase Treebanks to Dependency Trees , 2004, LREC.

[15]  Michael Collins,et al.  A Statistical Parser for Czech , 1999, ACL.

[16]  David A. Smith,et al.  Parser Adaptation and Projection with Quasi-Synchronous Grammar Features , 2009, EMNLP.

[17]  Richard Johansson,et al.  Extended Constituent-to-Dependency Conversion for English , 2007, NODALIDA.

[18]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[19]  Eugene Charniak,et al.  Reranking and Self-Training for Parser Adaptation , 2006, ACL.

[20]  M. A. R T A P A L,et al.  The Penn Chinese TreeBank: Phrase structure annotation of a large corpus , 2005, Natural Language Engineering.

[21]  Zhou Qiang Annotation Scheme for Chinese Treebank , 2004 .

[22]  Joakim Nivre,et al.  Inductive Dependency Parsing , 2006, Text, speech and language technology.

[23]  Stephen Clark,et al.  Chinese Segmentation with a Word-Based Perceptron Algorithm , 2007, ACL.

[24]  Qun Liu,et al.  Iterative Annotation Transformation with Predict-Self Reestimation for Chinese Word Segmentation , 2012, EMNLP-CoNLL.

[25]  Qun Liu,et al.  Automatic Adaptation of Annotation Standards: Chinese Word Segmentation and POS Tagging - A Case Study , 2009, ACL/IJCNLP.

[26]  Brian Roark,et al.  Incremental Parsing with the Perceptron Algorithm , 2004, ACL.

[27]  Dan Klein,et al.  Improved Inference for Unlexicalized Parsing , 2007, NAACL.

[28]  Wanxiang Che,et al.  Exploiting Multiple Treebanks for Parsing with Quasi-synchronous Grammars , 2012, ACL.