A Graph Modeling of Semantic Similarity between Words

The problem of measuring the semantic similarity between pairs of words has been considered a fundamental operation in data mining and information retrieval. Nevertheless, developing a computational method capable of generating satisfactory results close to what humans would perceive is still a difficult task somewhat owed to the subjective nature of similarity. In this paper, it is presented a novel algorithm for scoring the semantic similarity (SSA) between words. Given two input words w1and w2, SSA exploits their corresponding concepts, relationships, and descriptive glosses available in WordNet in order to build a rooted weighted graph Gsim. The output score is calculated by exploring the concepts present in Gsim and selecting the minimal distance between any two concepts c1 and c2 of w1 and w2 respectively. The definition of distance is a combination of: 1) the depth of the nearest common ancestor between c1 and c2 in Gsim, 2) the intersection of the descriptive glosses of c1 and c2, and 3) the shortest distance between c1 and c2 in Gsim. A correlation of 0.913 has been achieved between the results by SSA and the human ratings reported by Miller and Charles (1991) for a dataset of 28 pairs of nouns. Furthermore, using the full dataset of 65 pairs presented by Rubenstein and Goodenough (1965), the correlation between SSA results and the known human ratings is 0.903, which is higher than all other reported algorithms for the same dataset. The high correlations of SSA with human ratings suggest that SSA would be convenient in solving several data mining and information retrieval problems.

[1]  Eduard Hovy,et al.  Multi-Document Person Name Resolution , 2004 .

[2]  John B. Goodenough,et al.  Contextual correlates of synonymy , 1965, CACM.

[3]  Ted Pedersen,et al.  Using WordNet-based Context Vectors to Estimate the Semantic Relatedness of Concepts , 2006 .

[4]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[5]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[6]  Rada Mihalcea,et al.  Measuring the Semantic Similarity of Texts , 2005, EMSEE@ACL.

[7]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[8]  Iryna Gurevych,et al.  Semantic Similarity Applied to Spoken Dialogue Summarization , 2004, COLING.

[9]  David McLean,et al.  An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources , 2003, IEEE Trans. Knowl. Data Eng..

[10]  Ossama Emam,et al.  Unsupervised Information Extraction Approach Using Graph Mutual Reinforcement , 2006, EMNLP.

[11]  Diana McCarthy,et al.  Relating WordNet Senses for Word Sense Disambiguation , 2006 .

[12]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[13]  Graeme Hirst,et al.  Lexical chains as representations of context for the detection and correction of malapropisms , 1995 .

[14]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[15]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[16]  David M. W. Powers,et al.  Measuring Semantic Similarity in the Taxonomy of WordNet , 2005, ACSC.

[17]  Jong Wook Kim,et al.  CP/CV: concept similarity mining without frequency information from domain describing taxonomies , 2006, CIKM '06.

[18]  Diana Inkpen,et al.  Semantic Similarity for Detecting Recognition Errors in Automatic Speech Transcripts , 2005, HLT.

[19]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[20]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[21]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[22]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.