Remove Redundancy Samples for SVM in A Chinese Word Segmentation Task

This paper proposes an algorithm that can remove a large number of redundancy samples in a task of using SVM for Chinese word segmentation, and it will not drop much of the final experimental performance. This can ease the training of Chinese word segmentation to a certain extent. This algorithm is fast and needs no extra cost. Both theoretical analysis and experiments show that this algorithm works better, it removes almost 45% of the redundancy samples and the precision ration of our Chinese word segmentation drops less than 3%. 1