Feature-based Thai Word Segmentation
暂无分享,去创建一个
Word segmentation is a problem in several Asian languages that have no explicit word boundary delimiter, e.g. Chinese, Japanese, Korean and Thai. We propose to use feature-based approaches for Thai word segmenta-tion. A feature can be anything that tests for speciic information in the context around the word in question, such as context words and collocations. To automatically extract such features from a training corpus, we employ two learning algorithms, namely RIP-PER and Winnow. Experimental results show that both algorithms appear to outper-form the existing Thai word segmentation methods, especially for context-dependent strings.
[1] N. Littlestone. Learning Abound: Quickly When Irrelevant Attributes A New Linear-threshold Algorithm , 1988 .
[2] Avrim Blum,et al. Empirical Support for Winnow and Weighted-Majority Algorithms: Results on a Calendar Scheduling Domain , 2004, Machine Learning.
[3] David Yarowsky,et al. DECISION LISTS FOR LEXICAL AMBIGUITY RESOLUTION: Application to Accent Restoration in Spanish and French , 1994, ACL.