论文信息 - Remove Redundancy Samples for SVM in A Chinese Word Segmentation Task

Remove Redundancy Samples for SVM in A Chinese Word Segmentation Task

This paper proposes an algorithm that can remove a large number of redundancy samples in a task of using SVM for Chinese word segmentation, and it will not drop much of the final experimental performance. This can ease the training of Chinese word segmentation to a certain extent. This algorithm is fast and needs no extra cost. Both theoretical analysis and experiments show that this algorithm works better, it removes almost 45% of the redundancy samples and the precision ration of our Chinese word segmentation drops less than 3%. 1

Yao | Ren | Feiliang | Tianshun

[1] Yuji Matsumoto,et al. Unknown Word Guessing and Part-of-Speech Tagging Using Support Vector Machines , 2001, NLPRS.

[2] Manabu Sassano,et al. An Empirical Study of Active Learning with Support Vector Machines for Japanese Word Segmentation , 2002, ACL.

[3] Feiliang Ren,et al. A dynamic weighted method with support vector machines for Chinese word segmentation , 2005, 2005 International Conference on Natural Language Processing and Knowledge Engineering.

[4] Yuji Matsumoto,et al. Japanese Dependency Structure Analysis Based on Support Vector Machines , 2000, EMNLP.

[6] Thorsten Joachims,et al. Making large scale SVM learning practical , 1998 .

[7] Yuji Matsumoto,et al. Chunking with Support Vector Machines , 2001, NAACL.

[8] Yuji Matsumoto,et al. Use of Support Vector Learning for Chunk Identification , 2000, CoNLL/LLL.

[9] Jesús Giménez,et al. Fast and accurate part-of-speech tagging , 2004 .

[10] Thorsten Joachims,et al. Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.