Chinese-English Bilingual Word Semantic Similarity Based on Chinese WordNet

Semantic similarity measurement of multilingual words is a challenging problem in data mining, information extraction, information retrieval, etc. This paper introduces an algorithm to measure the semantic similarity of Chinese-English bilingual words based on Chinese WordNet, an expansion of WordNet in Simplified Chinese. The algorithm not only measures the semantic similarity for Chinese and English words, but also measures Chinese-English cross-lingual word semantic similarity. It utilizes WordNet's hypernym / hyponym relationships between synsets and evaluates the similarity by measuring the distances between synsets, the local densities of synsets and the depths of the synsets on the entire hierarchy of WordNet. Most words have more than one meaning. Therefore, the algorithm sets up the weights of the combination pairs of the two words' synsets in an adaptive mode. Experimental results show that the similarities measured by our algorithm match with human common sense in general.

[1]  David McLean,et al.  An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources , 2003, IEEE Trans. Knowl. Data Eng..

[2]  Lin Dai,et al.  An English-Chinese Cross-lingual Word Semantic Similarity Measure Exploring Attributes and Relations , 2011, PACLIC.

[3]  Danushka Bollegala,et al.  A Web Search Engine-Based Approach to Measure Semantic Similarity between Words , 2011, IEEE Transactions on Knowledge and Data Engineering.

[4]  Shi Shuicai An Improved Word Similarity Computing Method Based on HowNet , 2008 .

[5]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[6]  Xiaoguo Zhang,et al.  Query Expansion based on Associated Semantic Space , 2011, J. Comput..

[7]  SeungJin Lim,et al.  A Graph Modeling of Semantic Similarity between Words , 2007, International Conference on Semantic Computing (ICSC 2007).

[8]  Euripides G. M. Petrakis,et al.  Semantic similarity methods in wordNet and their application to information retrieval on the web , 2005, WIDM '05.

[9]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[10]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[11]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[12]  Peng Wang,et al.  Applications of Text Clustering Based on Semantic Body for Chinese Spam Filtering , 2012, J. Comput..

[13]  SeungJin Lim,et al.  A Graph Modeling of Semantic Similarity between Words , 2007 .

[14]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[15]  Xiaolong Wang,et al.  Quantifying semantic similarity of Chinese words from HowNet , 2002, Proceedings. International Conference on Machine Learning and Cybernetics.

[16]  David Maier,et al.  From databases to dataspaces: a new abstraction for information management , 2005, SGMD.

[17]  Fang Wu,et al.  A New Measure of Word Semantic Similarity Based on WordNet Hierarchy and DAG Theory , 2009, 2009 International Conference on Web Information Systems and Mining.

[18]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[19]  Eneko Agirre,et al.  A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches , 2009, NAACL.