Character-based Collocation for Mandarin Chinese

This paper describes a characters-based Chinese collocation system and discusses the advantages of it over a traditiolml word-based systcm. Since wordbreaks are not conventionally marked in Chinese text corpora, a character-based collocation system has the dual advantages of avoiding pre-proccssing distortion and directly accessing sub-lexical information. Furthermore, word-based collocational properties can be obtained through an auxiliary modttle of automatic segmentation. corpora as they are, we ",viii be able to access sub-lexical information without additional cost. To take the full advantage of the nature of texts, reliable tools can also be devised to obtain [exical collocation. In this paper, we ,,viii describe the design and implementation e r a Chinese collocational system that does not require the preprocessing of automatic segmentation but is awe to allow both lexical and sub-lexical information be automatically extracted.