Chinese Word Segmentation in MSR-NLP
暂无分享,去创建一个
Word segmentation in MSR-NLP is an integral part of a sentence analyzer which includes basic segmentation, derivational morphology, named entity recognition, new word identification, word lattice pruning and parsing. The final segmentation is produced from the leaves of parse trees. The output can be customized to meet different segmentation standards through the value combinations of a set of parameters. The system participated in four tracks of the segmentation bakeoff -- PK-open, PK-close, CTB-open and CTB-closed - and ranked #1, #2, #2 and #3 respectively in those tracks. Analysis of the results shows that each component of the system contributed to the scores.
[1] Andi Wu,et al. Dynamic Lexical Acquisition in Chinese Sentence Analysis , 2002, COLING.
[2] Andi Wu,et al. Word Segmentation In Sentence Analysis , 1998 .
[3] Andi Wu,et al. Statistically-Enhanced New Word Identification in a Rule-Based Chinese System , 2000, ACL 2000.
[4] Andi Wu,et al. Customizable Segmentation of Morphologically Derived Words in Chinese , 2003, Int. J. Comput. Linguistics Chin. Lang. Process..