Joint Word Segmentation, POS-Tagging and Syntactic Chunking

Chinese chunking has traditionally been solved by assuming gold standard word segmentation. We find that the accuracies drop drastically when automatic segmentation is used. Inspired by the fact that chunking knowledge can potentially improve segmentation, we explore a joint model that performs segmentation, POS-tagging and chunking simultaneously. In addition, to address the sparsity of full chunk features, we employ a semi-supervised method to derive chunk cluster features from large-scale automatically-chunked data. Results show the effectiveness of the joint model with semi-supervised features.

[1]  Guodong Zhou,et al.  Unified Dependency Parsing of Chinese Morphological and Syntactic Structures , 2012, EMNLP.

[2]  Christopher D. Manning,et al.  Nested Named Entity Recognition , 2009, EMNLP.

[3]  Hitoshi Isahara,et al.  An Empirical Study of Chinese Chunking , 2006, ACL.

[4]  Brian Roark,et al.  Incremental Parsing with the Perceptron Algorithm , 2004, ACL.

[5]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[6]  Tianshun Yao,et al.  Applying Conditional Random Fields to Chinese Shallow Parsing , 2005, CICLing.

[7]  Shih-Hung Wu,et al.  Applying Maximum Entropy to Robust Chinese Shallow Parsing , 2005, ROCLING.

[8]  Anthony Kroch,et al.  The Bracketing Guidelines for the Penn Chinese Treebank (3.0) , 2000 .

[9]  Guanglu Sun,et al.  Chinese Chunking Algorithm Based on Cascaded Conditional Random Fields , 2008 .

[10]  Changning Huang,et al.  Improving Chinese Chunking with Enriched Statistical and Morphological Knowledge , 2007, 2007 International Conference on Natural Language Processing and Knowledge Engineering.

[11]  Steven Abney,et al.  Parsing By Chunks , 1991 .

[12]  Stephen Clark,et al.  Syntactic Processing Using the Generalized Perceptron and Beam Search , 2011, CL.

[13]  Yue Zhang,et al.  Chinese Parsing Exploiting Characters , 2013, ACL.

[14]  Tianshun Yao,et al.  Chinese Chunk Identification Using SVMs Plus Sigmoid , 2004, IJCNLP.

[15]  Weiguang Qu,et al.  Exploiting Chunk-level Features to Improve Phrase Chunking , 2012, EMNLP-CoNLL.

[16]  Koby Crammer,et al.  Flexible Text Segmentation with Structured Multilabel Classification , 2005, HLT.

[17]  Wanxiang Che,et al.  Joint Word Alignment and Bilingual Named Entity Recognition Using Dual Decomposition , 2013, ACL.

[18]  Claire Cardie,et al.  Joint Inference for Fine-grained Opinion Extraction , 2013, ACL.

[19]  Yuji Matsumoto,et al.  Chunking with Support Vector Machines , 2001, NAACL.

[20]  Yuji Matsumoto,et al.  Use of Support Vector Learning for Chunk Identification , 2000, CoNLL/LLL.

[21]  Kentaro Torisawa,et al.  Improving Dependency Parsing with Subtrees from Auto-Parsed Data , 2009, EMNLP.

[22]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[23]  Heng Li,et al.  Transductive HMM based Chinese text chunking , 2003, International Conference on Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003.

[24]  Jun'ichi Tsujii,et al.  Incremental Joint Approach to Word Segmentation, POS Tagging, and Dependency Parsing in Chinese , 2012, ACL.

[25]  Tiejun Zhao,et al.  Statistics Based Hybrid Approach to Chinese Base Phrase Identification , 2000, ACL 2000.

[26]  Sabine Buchholz,et al.  Introduction to the CoNLL-2000 Shared Task Chunking , 2000, CoNLL/LLL.

[27]  Tong Zhang,et al.  Text Chunking based on a Generalization of Winnow , 2002, J. Mach. Learn. Res..

[28]  Nianwen Xue,et al.  The Bracketing Guidelines for the Penn Chinese Treebank Project , 2000 .

[29]  Stephen Clark,et al.  A Fast Decoder for Joint Word Segmentation and POS-Tagging Using a Single Discriminative Model , 2010, EMNLP.