Parsing TCT with a Coarse-to-fine Approach

A key observation is that concept compound constituent labels are detrimental to parsing performance. We use a PCFG parsing algorithm that uses a multilevel coarse-to-fine scheme. Our approach requires a sequence of nested partitions or equivalence classes of the PCFG nonterminals, where the nonterminals of each PCFG are clusters of nonterminals of the finer PCFG. We use the results of parsing at a coarser level to prune the next finer level. The coarse-to-fine method use hierarchical projections for incremental pruning. We present experiments which show that parsing with hierarchical state-splitting is fast and accurate on Tsinghua Chinese Treebank. In addition, we propose a multiple-model method that adds concept compound labels to the output of the simple PCFG model and gains higher bracketing recall from the simple model. This scheme can be implemented by training two models on different labeling styles.

[1]  Eugene Charniak,et al.  Edge-Based Best-First Chart Parsing , 1998, VLC@COLING/ACL.

[2]  Dan Klein,et al.  A* Parsing: Fast Exact Viterbi Parse Selection , 2003, NAACL.

[3]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[4]  Brian Roark,et al.  Probabilistic Context-Free Grammar Induction Based on Structural Zeros , 2006, NAACL.

[5]  Eugene Charniak,et al.  Reranking and Self-Training for Parser Adaptation , 2006, ACL.

[6]  David Ellis,et al.  Multilevel Coarse-to-Fine PCFG Parsing , 2006, NAACL.

[7]  Markus Dreyer,et al.  Better Informed Training of Latent Syntactic Features , 2006, EMNLP.

[8]  Qiang Zhou,et al.  Chinese Syntactic Parsing Evaluation , 2010, CIPS-SIGHAN.

[9]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[10]  Dan Klein,et al.  Improved Inference for Unlexicalized Parsing , 2007, NAACL.

[11]  Nianwen Xue,et al.  Building a Large-Scale Annotated Chinese Corpus , 2002, COLING.

[12]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[13]  Jun'ichi Tsujii,et al.  Probabilistic CFG with Latent Annotations , 2005, ACL.

[14]  Joshua Goodman,et al.  Parsing Algorithms and Metrics , 1996, ACL.

[15]  Giorgio Satta,et al.  Cross-Entropy and Estimation of Probabilistic Context-Free Grammars , 2006, NAACL.

[16]  Andrew Y. Ng,et al.  Solving the Problem of Cascading Errors: Approximate Bayesian Inference for Linguistic Annotation Pipelines , 2006, EMNLP.

[17]  Daniel Jurafsky,et al.  Shallow Semantc Parsing of Chinese , 2004, HLT-NAACL.

[18]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.