论文信息 - Method of sentence segmentation and punctuating for ancient Chinese literatures based on cascaded CRF

Method of sentence segmentation and punctuating for ancient Chinese literatures based on cascaded CRF

Data sparseness is a primary challenge in sentence segmentation and punctuating for ancient Chinese literatures using natural language processing technology.In order to overcome this difficulty,designed a 6-tag set and proposed a method based on cascaded conditional random fields.The main idea was as follows: basing on the 6-tag set,a low level model determined the boundaries of sentences according to observation sequence and a high level model punctuated sentences taking consideration of both observation sequence and low level's results.Done close test and open test based on approximate 5M mixed corpus respectively.The F measure of sentence segmentation and punctuation were 96.48% and 91.35% respectively in close test,and those were 71.42% and 67.67% respectively in open test.

Zhou Wei-dong