Context-based Approach to Combinational Ambiguity Resolution in Chinese Word Segmentation

Combinational ambiguity is a challenging issue in Chinese word segmentation in that its disambiguation depends on the contextual information.This paper collected contextual information statistics of combinational ambiguity words and establishes a context model using log likelihood ratio.A weight calculation formula is designed considering contextual information's window size,location and the frequency.Based on this,two methods are investigated for disambiguation.One uses the maximum log likelihood ratio in contextual information;the other uses the maximum sum of log likelihood ratio between the situation of combination and separation in contextual information.Tested on 14 high-frequence ambiguous words,the average accuracy of the former method reaches 84.93%,and that of the latter reaches 95.60 %.The result of the experiment reveals that using the combination of contextual information is effective for disambiguation.