Chinese Sentence Similarity Based on Multi-feature Combination

The Chinese sentence similarity computation has been used widely in the field of Chinese information processing. Many methods have been proposed to measure the similarity of Chinese sentences, but they focus mainly on one or two features, e.g. words, structure or semantic information and so on. The accuracy of these methods is usually lower. In this paper, we present a new approach to compute the similarity of Chinese sentences based on multi-feature combination. This method defines the key features in similarity computation and then combines their contribution to obtain the sentence similarity. Experiments show that this method has higher accuracy in Chinese sentence similarity computation.

[1]  Ying Liu,et al.  Example-based Chinese-English MT , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[2]  Ted Pedersen,et al.  Extended Gloss Overlaps as a Measure of Semantic Relatedness , 2003, IJCAI.

[3]  James Allan,et al.  Retrieval and novelty detection at the sentence level , 2003, SIGIR.

[4]  Zuhair Bandar,et al.  Sentence similarity based on semantic nets and corpus statistics , 2006, IEEE Transactions on Knowledge and Data Engineering.

[5]  Chris Mellish,et al.  Combining information extraction with genetic algorithms for text mining , 2004, IEEE Intelligent Systems.

[6]  Bin Liu,et al.  Measuring Semantic Similarity between Words Using HowNet , 2008, 2008 International Conference on Computer Science and Information Technology.

[7]  Paolo Tiberio,et al.  A syntactic approach for searching similarities within sentences , 2002, CIKM '02.

[8]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[9]  Qiang Luo,et al.  A novel similarity measure for dependency trees [query answer system example] , 2005, Proceedings. 2005 International Conference on Communications, Circuits and Systems, 2005..

[10]  W. Bruce Croft,et al.  Similarity measures for tracking information flow , 2005, CIKM '05.

[11]  Jie Liu,et al.  A New Approach to Compute the Semantic Similarity of Chinese Question Sentence , 2007, 2007 International Conference on Machine Learning and Cybernetics.

[12]  Xiaohua Hu,et al.  Utilizing Sentence Similarity and Question Type Similarity to Response to Similar Questions in Knowledge-Sharing Community , 2008 .

[13]  Jinwoo Park,et al.  Improving text categorization using the importance of sentences , 2004, Inf. Process. Manag..

[14]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[15]  Kaizhong Zhang,et al.  Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems , 1989, SIAM J. Comput..