Incremental Joint Approach to Word Segmentation, POS Tagging, and Dependency Parsing in Chinese

We propose the first joint model for word segmentation, POS tagging, and dependency parsing for Chinese. Based on an extension of the incremental joint model for POS tagging and dependency parsing (Hatori et al., 2011), we propose an efficient character-based decoding method that can combine features from state-of-the-art segmentation, POS tagging, and dependency parsing models. We also describe our method to align comparable states in the beam, and how we can combine features of different characteristics in our incremental framework. In experiments using the Chinese Treebank (CTB), we show that the accuracies of the three tasks can be improved significantly over the baseline models, particularly by 0.6% for POS tagging and 2.4% for dependency parsing. We also perform comparison experiments with the partially joint models.

[1]  Bo Xu,et al.  Probabilistic Parsing Action Models for Multi-Lingual Dependency Parsing , 2007, EMNLP.

[2]  F. Xia,et al.  The Part-Of-Speech Tagging Guidelines for the Penn Chinese Treebank (3.0) , 2000 .

[3]  Joakim Nivre,et al.  Transition-based Dependency Parsing with Rich Non-local Features , 2011, ACL.

[4]  Noah A. Smith,et al.  Joint Morphological and Syntactic Disambiguation , 2007, EMNLP.

[5]  Haizhou Li,et al.  Joint Models for Chinese POS Tagging and Dependency Parsing , 2011, EMNLP.

[6]  Qun Liu,et al.  A Cascaded Linear Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging , 2008, ACL.

[7]  Stephen Clark,et al.  A Fast Decoder for Joint Word Segmentation and POS-Tagging Using a Single Discriminative Model , 2010, EMNLP.

[8]  Philipp Koehn,et al.  Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) , 2007 .

[9]  Yoshimasa Tsuruoka,et al.  Improving Chinese Word Segmentation and POS Tagging with Semi-supervised Methods Using Large Auto-Analyzed Data , 2011, IJCNLP.

[10]  Weiwei Sun,et al.  A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging , 2011, ACL.

[11]  Kenji Sagae,et al.  Dynamic Programming for Linear-Time Incremental Parsing , 2010, ACL.

[12]  Jun'ichi Tsujii,et al.  Incremental Joint POS Tagging and Dependency Parsing in Chinese , 2011, IJCNLP.

[13]  Brian Roark,et al.  Incremental Parsing with the Perceptron Algorithm , 2004, ACL.

[14]  Hitoshi Isahara,et al.  An Error-Driven Word-Character Hybrid Model for Joint Chinese Word Segmentation and POS Tagging , 2009, ACL/IJCNLP.

[15]  Stephen Clark,et al.  Joint Word Segmentation and POS Tagging Using a Single Perceptron , 2008, ACL.

[16]  Reut Tsarfaty,et al.  A Single Generative Model for Joint Morphological Segmentation and Syntactic Parsing , 2008, ACL.

[17]  SproatRichard,et al.  A stochastic finite-state word-segmentation algorithm for Chinese , 1996 .

[18]  Xiaoqiang Luo A Maximum Entropy Chinese Character-Based Parser , 2003, EMNLP.