Using WordNet-based Context Vectors to Estimate the Semantic Relatedness of Concepts

In this paper, we introduce a WordNetbased measure of semantic relatedness by combining the structure and content of WordNet with co–occurrence information derived from raw text. We use the co–occurrence information along with the WordNet definitions to build gloss vectors corresponding to each concept in WordNet. Numeric scores of relatedness are assigned to a pair of concepts by measuring the cosine of the angle between their respective gloss vectors. We show that this measure compares favorably to other measures with respect to human judgments of semantic relatedness, and that it performs well when used in a word sense disambiguation algorithm that relies on semantic relatedness. This measure is flexible in that it can make comparisons between any two concepts without regard to their part of speech. In addition, it can be adapted to different domains, since any plain text corpus can be used to derive the co–occurrence information.

[1]  John B. Goodenough,et al.  Contextual correlates of synonymy , 1965, CACM.

[2]  Paul Procter,et al.  Longman Dictionary of Contemporary English , 1978 .

[3]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[4]  D. Carnine Utilization of Contextual Information in Determining the Meaning of Unfamiliar Words. , 1984 .

[5]  C. Spearman The proof and measurement of association between two things. By C. Spearman, 1904. , 1987, The American journal of psychology.

[6]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[7]  Yoshihiko Nitta,et al.  Co-Occurrence Vectors From Corpora vs. Distance Vectors From Dictionaries , 1994, COLING.

[8]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[9]  Della Summers,et al.  Longman Dictionary of Contemporary English , 1995 .

[10]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[11]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[12]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[13]  Martin Chodorow,et al.  Combining local context and wordnet similarity for word sense identification , 1998 .

[14]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[15]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[16]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[17]  Michael Ramscar,et al.  Testing the Distributioanl Hypothesis: The influence of Context on Judgements of Semantic Similarity , 2001 .

[18]  Graeme Hirst,et al.  Automatic Sense Disambiguation of the Near-Synonyms in a Dictionary Entry , 2003, CICLing.

[19]  Ted Pedersen,et al.  Extended Gloss Overlaps as a Measure of Semantic Relatedness , 2003, IJCAI.

[20]  Ted Pedersen,et al.  Using Measures of Semantic Relatedness for Word Sense Disambiguation , 2003, CICLing.

[21]  Yorick Wilks,et al.  Providing machine tractable dictionary tools , 1990, Machine Translation.

[22]  Graeme Hirst,et al.  Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures , 2004 .

[23]  Mark Stevenson,et al.  A Semantic Approach to IE Pattern Induction , 2005, ACL.