论文信息 - Chinese Word Segmentation with Dual Decomposition

Chinese Word Segmentation with Dual Decomposition

There are two dominant approaches to Chinese word segmentation: word-based and character-based models, each with respective strengths. Prior work has shown that gains in segmentation performance can be achieved from combining these two types of models; however, past efforts have not provided a practical technique to allow mainstream adoption. We propose a method that effectively combines the strength of both segmentation schemes using an efficient dual-decomposition algorithm for joint inference. Our method is simple and easy to implement. Experiments on SIGHAN 2003 and 2005 evaluation datasets show that our method achieves the best reported results to date on 6 out of 7 datasets.

Christopher D. Manning | Mengqiu Wang | Rob Voigt

[1] Christopher D. Manning,et al. Optimizing Chinese Word Segmentation for Machine Translation Performance , 2008, WMT@ACL.

[2] Keh-Jiann Chen,et al. Improving Word Alignment by Adjusting Chinese Word Segmentation , 2008, IJCNLP.

[3] Alexander M. Rush,et al. Dual Decomposition for Parsing with Non-Projective Head Automata , 2010, EMNLP.

[4] Keh-Jiann Chen,et al. Word Identification for Mandarin Chinese Sentences , 1992, COLING.

[5] John DeNero,et al. Model-Based Aligner Combination Using Dual Decomposition , 2011, ACL.

[6] Dekang Lin. Combining Language Modeling and Discriminative Classification for Word Segmentation , 2009, CICLing.

[7] Michael Collins,et al. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[8] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[9] Alexander M. Rush,et al. A Tutorial on Dual Decomposition and Lagrangian Relaxation for Inference in Natural Language Processing , 2012, J. Artif. Intell. Res..

[10] Thomas Emerson,et al. The Second International Chinese Word Segmentation Bakeoff , 2005, IJCNLP.

[11] Dan Klein,et al. An Empirical Examination of Challenges in Chinese Parsing , 2013, ACL.

[12] Changning Huang,et al. Improved Source-Channel Models for Chinese Word Segmentation , 2003, ACL.