Using Information Content to Evaluate Semantic Similarity in a Taxonomy

This paper presents a new measure of semantic similarity in an IS-A taxonomy, based on the notion of information content. Experimental evaluation suggests that the measure performs encouragingly well (a correlation of r = 0.79 with a benchmark set of human similarity judgments, with an upper bound of r = 0.90 for human subjects performing the same task), and significantly better than the traditional edge counting approach (r = 0.66).

[1]  John B. Goodenough,et al.  Contextual correlates of synonymy , 1965, CACM.

[2]  Allan Collins,et al.  A spreading-activation theory of semantic processing , 1975 .

[3]  A. Tversky Features of Similarity , 1977 .

[4]  W. Nelson Francis,et al.  FREQUENCY ANALYSIS OF ENGLISH USAGE: LEXICON AND GRAMMAR , 1983 .

[5]  R. Burchfield Frequency Analysis of English Usage: Lexicon and Grammar. By W. Nelson Francis and Henry Kučera with the assistance of Andrew W. Mackie. Boston: Houghton Mifflin. 1982. x + 561 , 1985 .

[6]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[7]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[8]  Roy Rada,et al.  Ranking documents with a thesaurus , 1989, JASIS.

[9]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[10]  Casimir A. Kulikowski,et al.  Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning and Expert Systems , 1990 .

[11]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[12]  Myoung-Ho Kim,et al.  Information Retrieval Based on Conceptual Distance in is-a Hierarchies , 1993, J. Documentation.

[13]  Philip Resnik,et al.  Semantic Classes and Syntactic Ambiguity , 1993, HLT.

[14]  P. Resnik Selection and information: a class-based approach to lexical relationships , 1993 .

[15]  Michael Sussna,et al.  Word sense disambiguation for free-text indexing using a massive semantic network , 1993, CIKM '93.

[16]  C. Leacock,et al.  Filling in a sparse training space for word sense identification , 1994 .

[17]  Philip Resnik,et al.  Disambiguating Noun Groupings with Respect to Wordnet Senses , 1995, VLC@ACL.

[18]  F. Grosjean Language and Cognitive Processes , 1996 .