Joint Word Segmentation and POS Tagging Using a Single Perceptron

For Chinese POS tagging, word segmentation is a preliminary step. To avoid error propagation and improve segmentation by utilizing POS information, segmentation and tagging can be performed simultaneously. A challenge for this joint approach is the large combined search space, which makes efficient decoding very hard. Recent research has explored the integration of segmentation and POS tagging, by decoding under restricted versions of the full combined search space. In this paper, we propose a joint segmentation and POS tagging model that does not impose any hard constraints on the interaction between word and POS information. Fast decoding is achieved by using a novel multiple-beam search algorithm. The system uses a discriminative statistical model, trained using the generalized perceptron algorithm. The joint model gives an error reduction in segmentation accuracy of 14.6% and an error reduction in tagging accuracy of 12.2%, compared to the traditional pipeline approach.

[1]  Daniel Marcu,et al.  Learning as search optimization: approximate large margin methods for structured prediction , 2005, ICML.

[2]  Hwee Tou Ng,et al.  Chinese Part-of-Speech Tagging: One-at-a-Time or All-at-Once? Word-Based or Character-Based? , 2004, EMNLP.

[3]  Tetsuji Nakagawa,et al.  A Hybrid Approach to Word Segmentation and POS Tagging , 2007, ACL.

[4]  Andrew McCallum,et al.  Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[5]  Daniel Jurafsky,et al.  Morphological features help POS tagging of unknown words across language varieties , 2005, IJCNLP.

[6]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[7]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[8]  Nobuyuki Shimizu,et al.  Exact Decoding for Jointly Labeling and Chunking Sequences , 2006, ACL.

[9]  Brian Roark,et al.  Joint discriminative language modeling and utterance classification , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[10]  Mengqiu Wang,et al.  A Dual-layer CRFs Based Joint Decoding Method for Cascaded Segmentation and Labeling Tasks , 2007, IJCAI.

[11]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[12]  Fei Xia The Part-Of-Speech Tagging Guidelines for the Penn Chinese Treebank (3.0) , 2000 .

[13]  Andrew Y. Ng,et al.  Solving the Problem of Cascading Errors: Approximate Bayesian Inference for Linguistic Annotation Pipelines , 2006, EMNLP.

[14]  Stephen Clark,et al.  Chinese Segmentation with a Word-Based Perceptron Algorithm , 2007, ACL.