Stacking Heterogeneous Joint Models of Chinese POS Tagging and Dependency Parsing

Previous joint models of Chinese part-of-speech (POS) tagging and dependency parsing are extended from either graphor transition-based dependency models. Our analysis shows that the two models have different error distributions. In addition, integration of graphand transition-based dependency parsers by stacked learning (stacking) has achieved significant improvements. These motivate us to study the problem of stacking graphand transition-based joint models. We conduct experiments on Chinese Penn Treebank 5.1 (CTB5.1). The results demonstrate that the guided transition-based joint model obtains better performance than the guided graph-based joint model. Further, we introduce a constituent-based joint model which derives the POS tag sequence and dependency tree from the output of PCFG parsers, and then integrate it into the guided transition-based joint model. Finally, we achieve the best performance on CTB5.1, 94.95% in tagging accuracy and 83.98% in parsing accuracy respectively. TITLE AND ABSTRACT IN CHINESE 采用堆方法融合异种的中文词性和依存句法联合模型 过去的中文词性和依存句法联合模型基本上都根据基于图的依存句法分析模型或者 基于转移的依存句法分析模型进行拓展而形成的。我们的分析结果表明这两种不同的模型 错误分布并不一样,而且在依存句法中,将基于图的模型和基于转移的模型使用堆方法融 合之后,能够显著的提升依存句法的性能,这些促使我们进一步研究采用堆方法去融合基 于图的和基于转移的词性依存句法联合模型。我们在中文宾州树库5.1版本(CTB5.1)上 进行试验,实验结果表明,相比使用基于图的联合模型为被指导模型,采用转移的联合模 型为被指导模型能取得较好的性能。更进一步,我们介绍了基于短语句法结构的联合模 型,它从一个句子的概率短语文法分析器输出结果中提取句子的词性序列以及依存树结 果,然后我们采用基于短语句法结构的联合模型更进一步指导基于转移的联合模型,最终 我们在CTB5.1的数据上取得了最好结果,词性标注准确率达到94.95%,同时,依存句法 准确率达到83.98%。

[1]  Valentin I. Spitkovsky,et al.  A Comparison of Chinese Parsers for Stanford Dependencies , 2012, ACL.

[2]  Weiwei Sun,et al.  A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging , 2011, ACL.

[3]  Wanxiang Che,et al.  Improving Chinese POS Tagging with Dependency Parsing , 2011, IJCNLP.

[4]  Joakim Nivre,et al.  Integrating Graph-Based and Transition-Based Dependency Parsers , 2008, ACL.

[5]  Stephen Clark,et al.  A Tale of Two Parsers: Investigating and Combining Graph-based and Transition-based Dependency Parsing , 2008, EMNLP.

[6]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[7]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[8]  Xavier Carreras,et al.  Experiments with a Higher-Order Projective Dependency Parser , 2007, EMNLP.

[9]  Anders Søgaard,et al.  Semi-supervised dependency parsing using generalized tri-training , 2010, COLING.

[10]  Joakim Nivre,et al.  A Transition-Based System for Joint Part-of-Speech Tagging and Labeled Non-Projective Dependency Parsing , 2012, EMNLP.

[11]  Leo Breiman,et al.  Stacked regressions , 2004, Machine Learning.

[12]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[13]  Weiwei Sun Learning Chinese language structures with multiple views , 2012 .

[14]  Haizhou Li,et al.  Joint Models for Chinese POS Tagging and Dependency Parsing , 2011, EMNLP.

[15]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[16]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[17]  Yuji Matsumoto,et al.  Statistical Dependency Analysis with Support Vector Machines , 2003, IWPT.

[18]  Joakim Nivre,et al.  Analyzing and Integrating Dependency Parsers , 2011, CL.

[19]  Koby Crammer,et al.  Online Large-Margin Training of Dependency Parsers , 2005, ACL.

[20]  Weiwei Sun,et al.  Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging , 2012, ACL.

[21]  Dan Klein,et al.  Improved Inference for Unlexicalized Parsing , 2007, NAACL.

[22]  Marine Carpuat,et al.  A Stacked, Voted, Stacked Model for Named Entity Recognition , 2003, CoNLL.

[23]  Joakim Nivre,et al.  Transition-based Dependency Parsing with Rich Non-local Features , 2011, ACL.

[24]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[25]  Michael Collins,et al.  Efficient Third-Order Dependency Parsers , 2010, ACL.

[26]  Kenji Sagae,et al.  Dynamic Programming for Linear-Time Incremental Parsing , 2010, ACL.

[27]  Jun'ichi Tsujii,et al.  Incremental Joint POS Tagging and Dependency Parsing in Chinese , 2011, IJCNLP.

[28]  Daniel M. Bikel,et al.  Intricacies of Collins’ Parsing Model , 2004, CL.

[29]  Fernando Pereira,et al.  Discriminative learning and spanning tree algorithms for dependency parsing , 2006 .

[30]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[31]  Christopher D. Manning,et al.  Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.

[32]  M. A. R T A P A L,et al.  The Penn Chinese TreeBank: Phrase structure annotation of a large corpus , 2005, Natural Language Engineering.

[33]  Bo Xu,et al.  Probabilistic Models for Action-Based Chinese Dependency Parsing , 2007, ECML.

[34]  Joakim Nivre,et al.  Algorithms for Deterministic Incremental Dependency Parsing , 2008, CL.

[35]  Fernando Pereira,et al.  Online Learning of Approximate Dependency Parsing Algorithms , 2006, EACL.

[36]  Eric P. Xing,et al.  Stacking Dependency Parsers , 2008, EMNLP.