It is obvious that segmentation takes an important role in natural language processing(NLP), especially for the languages whose sentences are not easily separated into morphemes. In this study we propose a method of segmenting a sentence. The system described in this paper does not use any grammatical information or knowledge in processing. Instead, it uses statistical information drawn from non-tagged corpus of the target language. Most of the segmenting systems are to pick out conventional morphemes which is defined for human use. However, we still do not know whether those conventional morphemes are good units for computational processing.In this paper we explain our system's algorithm and its experimental results on Japanese, though this system is not designed for a particular language.
[1]
Masakazu Nakanishi,et al.
Automatic Extraction of Linky Strings in Natural Languages
,
1996
.
[2]
John Cocke,et al.
A Statistical Approach to Language Translation
,
1988,
COLING.
[3]
Kenneth Ward Church,et al.
Word Association Norms, Mutual Information, and Lexicography
,
1989,
ACL.
[4]
Virginia Teller,et al.
A Probabilistic Algorithm for Segmenting Non-Kanji Japanese Strings
,
1994,
AAAI.
[5]
Masakazu Nakanishi,et al.
Segmenting a Sentence Into Morphemes Using Statistic Information Between Words
,
1994,
COLING.