论文信息 - Chinese Word Segmentation with Conditional Support Vector Inspired Markov Models - 字舞流文

Chinese Word Segmentation with Conditional Support Vector Inspired Markov Models

In this paper, we present the proposed method of participating SIGHAN-2010 Chinese word segmentation bake-off. In this year, our focus aims to quick train and test the given data. Unlike the most structural learning algorithms, such as conditional random fields, we design an in-house development conditional support vector Markov model (CMM) framework. The method is very quick to train and also show better performance in accuracy than CRF. To give a fair comparison, we compare our method to CRF with three additional tasks, namely, CoNLL-2000 chunking, SIGHAN-3 Chinese word segmentation. The results were encourage and indicated that the proposed CMM produces better not only accuracy but also training time efficiency. The official results in SIGHAN-2010 also demonstrates that our method perform very well in traditional Chinese with fine-tuned features set.

Chongyang Zhang | Zhigang Chen | Guoping Hu | Jie-Chi Yang | Yu-Chieh Wu | Yue-Shi Lee

[1] Bernhard E. Boser,et al. A training algorithm for optimal margin classifiers , 1992, COLT '92.

[2] John Platt,et al. Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[3] Sabine Buchholz,et al. Introduction to the CoNLL-2000 Shared Task Chunking , 2000, CoNLL/LLL.

[4] Andrew McCallum,et al. Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[5] Yuji Matsumoto,et al. Chunking with Support Vector Machines , 2001, NAACL.

[6] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[7] Tong Zhang,et al. Text Chunking based on a Generalization of Winnow , 2002, J. Mach. Learn. Res..

[8] Ben Taskar,et al. Max-Margin Markov Networks , 2003, NIPS.

[9] Hwee Tou Ng,et al. Chinese Part-of-Speech Tagging: One-at-a-Time or All-at-Once? Word-Based or Character-Based? , 2004, EMNLP.

[10] Yuji Matsumoto,et al. Applying Conditional Random Fields to Japanese Morphological Analysis , 2004, EMNLP.

[11] S. Sathiya Keerthi,et al. A Modified Finite Newton Method for Fast Solution of Large Scale Linear SVMs , 2005, J. Mach. Learn. Res..

[12] Hwee Tou Ng,et al. A Maximum Entropy Approach to Chinese Word Segmentation , 2005, SIGHAN@IJCNLP 2005.

[13] Daniel Marcu,et al. Learning as search optimization: approximate large margin methods for structured prediction , 2005, ICML.

[14] Tong Zhang,et al. A High-Performance Semi-Supervised Learning Method for Text Chunking , 2005, ACL.

[15] Thorsten Joachims,et al. Training linear SVMs in linear time , 2006, KDD '06.

[16] Yue-Shi Lee,et al. Efficient and Robust Phrase Chunking Using Support Vector Machines , 2006, AIRS.

[17] Yu-Chieh Wu,et al. Description of the NCU Chinese Word Segmentation and Named Entity Recognition System for SIGHAN Bakeoff 2006 , 2006, SIGHAN@COLING/ACL.

[18] Gina-Anne Levow,et al. The Third International Chinese Language Processing Bakeoff: Word Segmentation and Named Entity Recognition , 2006, SIGHAN@COLING/ACL.

[19] Jun Suzuki,et al. Semi-Supervised Structured Output Learning Based on a Hybrid Generative and Discriminative Approach , 2007, EMNLP.

[20] Yue-Shi Lee,et al. Multilingual Deterministic Dependency Parsing Framework using Modified Finite Newton Method Support Vector Machines , 2007, EMNLP.

[21] Hai Zhao. Incorporating Global Information into Supervised Learning for Chinese Word Segmentation , 2007 .

[22] Yue-Shi Lee,et al. Description of the NCU Chinese Word Segmentation and Part-of-Speech Tagging for SIGHAN Bakeoff 2007 , 2008, IJCNLP.

[23] Yue-Shi Lee,et al. Robust and Efficient Chinese Word Dependency Analysis with Linear Kernel Support Vector Machines , 2008, COLING.

[24] Jun Suzuki,et al. Semi-Supervised Sequential Labeling and Segmentation Using Giga-Word Scale Unlabeled Data , 2008, ACL.

[25] Xiao Chen,et al. The Fourth International Chinese Language Processing Bakeoff: Chinese Word Segmentation, Named Entity Recognition and Chinese POS Tagging , 2008, IJCNLP.

[26] Yue-Shi Lee,et al. Robust and efficient multiclass SVM models for phrase pattern recognition , 2008, Pattern Recognit..

[27] Thorsten Joachims,et al. Cutting-plane training of structural SVMs , 2009, Machine Learning.