Learning the Taxonomy of Function Words for Parsing

Completely data-driven grammar training is prone to over-fitting. Human-defined word class knowledge is useful to address this issue. However, the manual word class taxonomy may be unreliable and irrational for statistical natural language processing, aside from its insufficient linguistic phenomena coverage and domain adaptivity. In this paper, a formalized representation of function word subcategorization is developed for parsing in an automatic manner. The function word classification representing intrinsic features of syntactic usages is used to supervise the grammar induction, and the structure of the taxonomy is learned simultaneously. The grammar learning process is no longer a unilaterally supervised training by hierarchical knowledge, but an interactive process between the knowledge structure learning and the grammar training. The established taxonomy implies the stochastic significance of the diversified syntactic features. The experiments on both Penn Chinese Treebank and Tsinghua Treebank show that the proposed method improves parsing performance by 1.6% and 7.6% respectively over the baseline.

[1]  Qun Liu,et al.  Parsing the Penn Chinese Treebank with Semantic Knowledge , 2005, IJCNLP.

[2]  Eytan Ruppin,et al.  Boosting Unsupervised Grammar Induction by Splitting Complex Sentences on Function Words , 2007 .

[3]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[4]  Zhou Qiang Annotation Scheme for Chinese Treebank , 2004 .

[5]  F. Xia,et al.  The Part-Of-Speech Tagging Guidelines for the Penn Chinese Treebank (3.0) , 2000 .

[6]  Roger Levy,et al.  Is it Harder to Parse Chinese, or the Chinese Treebank? , 2003, ACL.

[7]  Xihong Wu,et al.  Learning Grammar with Explicit Annotations for Subordinating Conjunctions , 2014, ACL.

[8]  Xihong Wu,et al.  Improved parsing with taxonomy of conjunctions , 2014, 2014 IEEE China Summit & International Conference on Signal and Information Processing (ChinaSIP).

[9]  Qiang Zhou Evaluation Reportof the fourth Chinese Parsing Evaluation: CIPS-SIGHAN-ParsEval-2014 , 2014, CIPS-SIGHAN.

[10]  Zdenek Zabokrtský,et al.  Dealing with Function Words in Unsupervised Dependency Parsing , 2014, CICLing.

[11]  Qiang Dong,et al.  HowNet - a hybrid language and knowledge resource , 2003, International Conference on Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003.

[12]  Detlef Prescher,et al.  Inducing Head-Driven PCFGs with Latent Heads: Refining a Tree-Bank Grammar for Parsing , 2005, ECML.

[13]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[14]  Slav Petrov,et al.  Coarse-to-Fine Natural Language Processing , 2011, Theory and Applications of Natural Language Processing.

[15]  Christiane Fellbaum,et al.  Obituary: George A. Miller , 2013, CL.

[16]  Stephan Oepen,et al.  Exploiting Semantic Information for HPSG Parse Selection , 2007, ACL 2007.

[17]  Yue Zhang,et al.  Chinese Parsing Exploiting Characters , 2013, ACL.

[18]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[19]  Timothy Baldwin,et al.  Improving Parsing and PP Attachment Performance with Sense Information , 2008, ACL.

[20]  Stephen Clark,et al.  Syntactic Processing Using the Generalized Perceptron and Beam Search , 2011, CL.

[21]  Anthony Kroch,et al.  The Bracketing Guidelines for the Penn Chinese Treebank (3.0) , 2000 .

[22]  Qiang Zhou Evaluation Reportof the third Chinese Parsing Evaluation: CIPS-SIGHAN-ParsEval-2012 , 2012, CIPS-SIGHAN.

[23]  Nianwen Xue,et al.  Building a Large-Scale Annotated Chinese Corpus , 2002, COLING.

[24]  Qiang Dong,et al.  Hownet And The Computation Of Meaning , 2006 .

[25]  Karl Stratos,et al.  Spectral Learning of Latent-Variable PCFGs , 2012, ACL.

[26]  Jun'ichi Tsujii,et al.  Probabilistic CFG with Latent Annotations , 2005, ACL.

[27]  Dan Klein,et al.  Improved Inference for Unlexicalized Parsing , 2007, NAACL.

[28]  The semantic Knowledge-base of Contemporary Chinese and Its Applications in WSD , 2003, SIGHAN.

[29]  Mark Johnson,et al.  Improving Unsupervised Dependency Parsing with Richer Contexts and Smoothing , 2009, NAACL.

[30]  Meng Zhang,et al.  Refining Grammars for Parsing with Hierarchical Semantic Knowledge , 2009, EMNLP.