Automatically Creating Datasets for Measures of Semantic Relatedness

Semantic relatedness is a special form of linguistic distance between words. Evaluating semantic relatedness measures is usually performed by comparison with human judgments. Previous test datasets had been created analytically and were limited in size. We propose a corpus-based system for automatically creating test datasets. Experiments with human subjects show that the resulting datasets cover all degrees of relatedness. As a result of the corpus-based approach, test datasets cover all types of lexical-semantic relations and contain domain-specific words naturally occurring in texts.

[1]  John B. Goodenough,et al.  Contextual correlates of synonymy , 1965, CACM.

[2]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[3]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[4]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[5]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[6]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[7]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[8]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[9]  Martin Chodorow,et al.  Combining local context and wordnet similarity for word sense identification , 1998 .

[10]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[11]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[12]  Rada Mihalcea,et al.  Automatic generation of a coarse grained WordNet , 2001, HTL 2001.

[13]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[14]  Michael Kluck The GIRT Data in the Evaluation of CLIR Systems - from 1997 Until 2003 , 2003, CLEF.

[15]  Ted Pedersen,et al.  Using Measures of Semantic Relatedness for Word Sense Disambiguation , 2003, CICLing.

[16]  Graeme Hirst,et al.  Non-Classical Lexical Semantic Relations , 2004, Proceedings of the HLT-NAACL Workshop on Computational Lexical Semantics - CLS '04.

[17]  Sabine Schulte im Walde,et al.  Identifying Semantic Relations and Functional Properties of Human Verb Associations , 2005, HLT/EMNLP.

[18]  Iryna Gurevych,et al.  Using the Structure of a Conceptual Network in Computing Semantic Relatedness , 2005, IJCNLP.

[19]  David J. Weir,et al.  Co-occurrence Retrieval: A Flexible Framework for Lexical Distributional Similarity , 2005, CL.

[20]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[21]  Iryna Gurevych,et al.  Thinking beyond the nouns - computing semantic relatedness across parts of speech , 2006 .