Learning Grammar with Explicit Annotations for Subordinating Conjunctions

Data-driven approach for parsing may suffer from data sparsity when entirely unsupervised. External knowledge has been shown to be an effective way to alleviate this problem. Subordinating conjunctions impose important constraints on Chinese syntactic structures. This paper proposes a method to develop a grammar with hierarchical category knowledge of subordinating conjunctions as explicit annotations. Firstly, each part-of-speech tag of the subordinating conjunctions is annotated with the most general category in the hierarchical knowledge. Those categories are human-defined to represent distinct syntactic constraints, and provide an appropriate starting point for splitting. Secondly, based on the data-driven state-split approach, we establish a mapping from each automatic refined subcategory to the one in the hierarchical knowledge. Then the data-driven splitting of these categories is restricted by the knowledge to avoid over refinement. Experiments demonstrate that constraining the grammar learning by the hierarchical knowledge improves parsing performance significantly over the baseline.

[1]  Qiang Dong,et al.  HowNet - a hybrid language and knowledge resource , 2003, International Conference on Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003.

[2]  Meng Zhang,et al.  Refining Grammars for Parsing with Hierarchical Semantic Knowledge , 2009, EMNLP.

[3]  Karl Stratos,et al.  Experiments with Spectral Learning of Latent-Variable PCFGs , 2013, HLT-NAACL.

[4]  Yang Liu,et al.  Joint Chinese Word Segmentation, POS Tagging and Parsing , 2012, EMNLP-CoNLL.

[5]  Zhou Qiang Annotation Scheme for Chinese Treebank , 2004 .

[6]  Dan Klein,et al.  Improved Inference for Unlexicalized Parsing , 2007, NAACL.

[7]  Giorgio Satta,et al.  Approximate PCFG Parsing Using Tensor Decomposition , 2013, NAACL.

[8]  Yue Zhang,et al.  Chinese Parsing Exploiting Characters , 2013, ACL.

[9]  Xiao Chen,et al.  Higher-order Constituent Parsing and Parser Combination , 2012, ACL.

[10]  Slav Petrov,et al.  Coarse-to-Fine Natural Language Processing , 2011, Theory and Applications of Natural Language Processing.

[11]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[12]  Stephan Oepen,et al.  Exploiting Semantic Information for HPSG Parse Selection , 2007, ACL 2007.

[13]  Jun'ichi Tsujii,et al.  Probabilistic CFG with Latent Annotations , 2005, ACL.

[14]  Dan Klein,et al.  Sparse Multi-Scale Grammars for Discriminative Latent Variable Parsing , 2008, EMNLP.

[15]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[16]  Timothy Baldwin,et al.  Improving Parsing and PP Attachment Performance with Sense Information , 2008, ACL.

[17]  Meng Zhang,et al.  Parsing-based Chinese word segmentation integrating morphological and syntactic information , 2011, 2011 7th International Conference on Natural Language Processing and Knowledge Engineering.

[18]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[19]  Nianwen Xue,et al.  Building a Large-Scale Annotated Chinese Corpus , 2002, COLING.

[20]  Hiroyuki Shindo,et al.  Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing , 2012, ACL.

[21]  Karl Stratos,et al.  Spectral Learning of Latent-Variable PCFGs , 2012, ACL.

[22]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[23]  Stephen Clark,et al.  Transition-Based Parsing of the Chinese Treebank using a Global Discriminative Model , 2009, IWPT.

[24]  Dan Klein,et al.  Parsing and Hypergraphs , 2001, IWPT.

[25]  Dongchen Li,et al.  Improved Chinese Parsing Using Named Entity Cue , 2013, IWPT.

[26]  Dan Klein,et al.  Discriminative Log-Linear Grammars with Latent Variables , 2007, NIPS.

[27]  Stephen Clark,et al.  Syntactic Processing Using the Generalized Perceptron and Beam Search , 2011, CL.

[28]  Mohit Bansal,et al.  An All-Fragments Grammar for Simple and Accurate Parsing , 2012 .

[29]  Maria Wolters,et al.  Proceedings of the 1st North American Chapter of the Association for Computational Linguistics Conference , 2000 .

[30]  Roger Levy,et al.  Is it Harder to Parse Chinese, or the Chinese Treebank? , 2003, ACL.