论文信息 - Feature-based Thai Word Segmentation

Feature-based Thai Word Segmentation

Word segmentation is a problem in several Asian languages that have no explicit word boundary delimiter, e.g. Chinese, Japanese, Korean and Thai. We propose to use feature-based approaches for Thai word segmenta-tion. A feature can be anything that tests for speciic information in the context around the word in question, such as context words and collocations. To automatically extract such features from a training corpus, we employ two learning algorithms, namely RIP-PER and Winnow. Experimental results show that both algorithms appear to outper-form the existing Thai word segmentation methods, especially for context-dependent strings.

Surapant Meknavin | S. Meknavin

[1] N. Littlestone. Learning Abound: Quickly When Irrelevant Attributes A New Linear-threshold Algorithm , 1988 .

[2] Avrim Blum,et al. Empirical Support for Winnow and Weighted-Majority Algorithms: Results on a Calendar Scheduling Domain , 2004, Machine Learning.

[3] David Yarowsky,et al. DECISION LISTS FOR LEXICAL AMBIGUITY RESOLUTION: Application to Accent Restoration in Spanish and French , 1994, ACL.