New method of character string similarity compute based on fusing multiple edit distances

The Chinese character is treated as the equivalent of western character when computing edit distance of strings composed of Chinese and western characters. Considering from Chinese input methods,this paper proposed a new way to calculate edit distance based on PinYin code and WuBi code of Chinese character. Also proposed the algorithm of fusing three edit distances to get string similarity. Experiment results show that the new method can improve the recall rate of approximately duplicate records detection,besides getting high precision rate.