Method of sentence segmentation and punctuating for ancient Chinese literatures based on cascaded CRF
暂无分享,去创建一个
Data sparseness is a primary challenge in sentence segmentation and punctuating for ancient Chinese literatures using natural language processing technology.In order to overcome this difficulty,designed a 6-tag set and proposed a method based on cascaded conditional random fields.The main idea was as follows: basing on the 6-tag set,a low level model determined the boundaries of sentences according to observation sequence and a high level model punctuated sentences taking consideration of both observation sequence and low level's results.Done close test and open test based on approximate 5M mixed corpus respectively.The F measure of sentence segmentation and punctuation were 96.48% and 91.35% respectively in close test,and those were 71.42% and 67.67% respectively in open test.