The state of the art in semantic relatedness: a framework for comparison

Semantic relatedness (SR) is a form of measurement that quantitatively identifies the relationship between two words or concepts based on the similarity or closeness of their meaning. In the recent years, there have been noteworthy efforts to compute SR between pairs of words or concepts by exploiting various knowledge resources such as linguistically structured (e.g. WordNet) and collaboratively developed knowledge bases (e.g. Wikipedia), among others. The existing approaches rely on different methods for utilizing these knowledge resources, for instance, methods that depend on the path between two words, or a vector representation of the word descriptions. The purpose of this paper is to review and present the state of the art in SR research through a hierarchical framework. The dimensions of the proposed framework cover three main aspects of SR approaches including the resources they rely on, the computational methods applied on the resources for developing a relatedness metric, and the evaluation models that are used for measuring their effectiveness. We have selected 14 representative SR approaches to be analyzed using our framework. We compare and critically review each of them through the dimensions of our framework, thus, identifying strengths and weaknesses of each approach. In addition, we provide guidelines for researchers and practitioners on how to select the most relevant SR method for their purpose. Finally, based on the comparative analysis of the reviewed relatedness measures, we identify existing challenges and potentially valuable future research directions in this domain.

[1]  Doug Downey,et al.  Explanatory semantic relatedness and explicit spatialization for exploratory search , 2012, SIGIR '12.

[2]  David McLean,et al.  An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources , 2003, IEEE Trans. Knowl. Data Eng..

[3]  Rada Mihalcea,et al.  Measuring the semantic relatedness between words and images , 2011, IWCS.

[4]  Rada Mihalcea,et al.  A Method for Word Sense Disambiguation of Unrestricted Text , 1999, ACL.

[5]  Giuseppe Pirrò,et al.  REWOrD: Semantic Relatedness in the Web of Data , 2012, AAAI.

[6]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[7]  David M. W. Powers,et al.  Verb similarity on the taxonomy of WordNet , 2006 .

[8]  Haofen Wang,et al.  LODDO: Using Linked Open Data Description Overlap to Measure Semantic Relatedness between Named Entities , 2011, JIST.

[9]  Hsin-Hsi Chen,et al.  Novel Association Measures Using Web Search with Double Checking , 2006, ACL.

[10]  Michael Grüninger,et al.  Semantic Integration through Invariants , 2005, AI Mag..

[11]  Danushka Bollegala,et al.  Disambiguating Personal Names on the Web Using Automatically Extracted Key Phrases , 2006, ECAI.

[12]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[13]  Torsten Zesch,et al.  Study of semantic relatedness of words using collaboratively constructed semantic resources , 2010 .

[14]  Carlo Tasso,et al.  Evaluating the Results of Methods for Computing Semantic Relatedness , 2013, CICLing.

[15]  Mehran Sahami,et al.  A web-based kernel function for measuring the similarity of short text snippets , 2006, WWW '06.

[16]  Peter D. Turney Expressing Implicit Semantic Relations without Supervision , 2006, ACL.

[17]  Iryna Gurevych,et al.  Using Wiktionary for Computing Semantic Relatedness , 2008, AAAI.

[18]  Hao Wu,et al.  Deep Semantic Embedding , 2014, SMIR@SIGIR.

[19]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[20]  Tony Veale,et al.  An Intrinsic Information Content Metric for Semantic Similarity in WordNet , 2004, ECAI.

[21]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[22]  Eneko Agirre,et al.  A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches , 2009, NAACL.

[23]  Yu Hao,et al.  Semantic Relationship Discovery with Wikipedia Structure , 2011, IJCAI.

[24]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[25]  Ergun Biçici,et al.  RTM-DCU: Predicting Semantic Similarity with Referential Translation Machines , 2015, *SEMEVAL.

[26]  A. A. Krizhanovsky,et al.  Related terms search based on WordNet / Wiktionary and its application in Ontology Matching , 2009, ArXiv.

[27]  John B. Goodenough,et al.  Contextual correlates of synonymy , 1965, CACM.

[28]  Ted Pedersen,et al.  Using WordNet-based Context Vectors to Estimate the Semantic Relatedness of Concepts , 2006 .

[29]  David N. Milne Computing Semantic Relatedness using Wikipedia Link Structure , 2007 .

[30]  Ted Pedersen,et al.  Measures of semantic similarity and relatedness in the biomedical domain , 2007, J. Biomed. Informatics.

[31]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[32]  Eduardo Mena,et al.  Web-Based Measure of Semantic Relatedness , 2008, WISE.

[33]  Lorena Otero-Cerdeira,et al.  Ontology matching: A literature review , 2015, Expert Syst. Appl..

[34]  Anthony Milanowski,et al.  Measuring and promoting inter-rater agreement of teacher and principal performance ratings. , 2012 .

[35]  Rafal A. Angryk,et al.  Measuring semantic similarity using wordnet-based context vectors , 2007, 2007 IEEE International Conference on Systems, Man and Cybernetics.

[36]  Iryna Gurevych,et al.  To Exhibit is not to Loiter: A Multilingual, Sense-Disambiguated Wiktionary for Measuring Verb Similarity , 2012, COLING.

[37]  Simone Paolo Ponzetto,et al.  WikiRelate! Computing Semantic Relatedness Using Wikipedia , 2006, AAAI.

[38]  Iryna Gurevych,et al.  The More the Better? Assessing the Influence of Wikipedia's Growth on Semantic Relatedness Measures , 2010, LREC.

[39]  Bu-Sung Lee,et al.  Event Detection in Twitter , 2011, ICWSM.

[40]  Iryna Gurevych,et al.  Using the Structure of a Conceptual Network in Computing Semantic Relatedness , 2005, IJCNLP.

[41]  Pavel Velikhov,et al.  Semantic Relatedness Metric for Wikipedia Concepts Based on Link Analysis and its Application to Word Sense Disambiguation , 2008, SYRCoDIS.

[42]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[43]  Pavol Návrat,et al.  Semantic Similarity in Content-Based Filtering , 2002, ADBIS.

[44]  Ian H. Witten,et al.  An effective, low-cost measure of semantic relatedness obtained from Wikipedia links , 2008 .

[45]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[46]  Mitsuru Ishizuka,et al.  Extracting Keyphrases to Represent Relations in Social Networks from Web , 2007, IJCAI.

[47]  Danushka Bollegala,et al.  Measuring semantic similarity between words using web search engines , 2007, WWW '07.

[48]  Peter Mika,et al.  Ontologies are us: A unified model of social networks and semantics , 2005, J. Web Semant..

[49]  Enrico Motta,et al.  Evaluating the Semantic Web: A Task-Based Approach , 2007, ISWC/ASWC.

[50]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[51]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[52]  Abdelmajid Ben Hamadou,et al.  Computing semantic relatedness using Wikipedia features , 2013, Knowl. Based Syst..

[53]  Max Mühlhäuser,et al.  Comparing Wikipedia and German Wordnet by Evaluating Semantic Relatedness on Multiple Datasets , 2007, NAACL.

[54]  Iryna Gurevych,et al.  Computing Semantic Relatedness of GermaNet Concepts , 2005 .

[55]  Jelena Jovanovic,et al.  Lexical Semantic Relatedness for Twitter Analytics , 2015, 2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI).

[56]  Chen Hai-ya Measuring Semantic Similarity between Words Using Web Search Engines , 2015 .

[57]  Kôiti Hasida,et al.  POLYPHONET: an advanced social network extraction system from the web , 2006, WWW '06.

[58]  Stan Szpakowicz,et al.  Roget's Thesaurus: a Lexical Resource to Treasure , 2012, ArXiv.

[59]  Stavros Christodoulakis,et al.  The OntoNL Semantic Relatedness Measure for OWL ontologies , 2007, 2007 2nd International Conference on Digital Information Management.

[60]  Tie-Yan Liu,et al.  Time-dependent semantic similarity measure of queries using historical click-through data , 2006, WWW '06.

[61]  Iryna Gurevych,et al.  Measuring semantic relatedness of GermaNet concepts , 2005 .

[62]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[63]  Thad Hughes,et al.  Lexical Semantic Relatedness with Random Graph Walks , 2007, EMNLP.

[64]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[65]  Andreas Stafylopatis,et al.  A Hybrid Web-Based Measure for Computing Semantic Relatedness Between Words , 2009, 2009 21st IEEE International Conference on Tools with Artificial Intelligence.

[66]  Ted Pedersen,et al.  An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet , 2002, CICLing.

[67]  Jianping Zeng,et al.  Computing Semantic Relatedness Based on Search Result Analysis , 2012, 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[68]  Evgeniy Gabrilovich,et al.  A word at a time: computing word relatedness using temporal semantic analysis , 2011, WWW.

[69]  Iryna Gurevych,et al.  Automatically Creating Datasets for Measures of Semantic Relatedness , 2006, ACL 2006.

[70]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[71]  Ron Weiss,et al.  Fast and effective query refinement , 1997, SIGIR '97.

[72]  Mohsen Kahani,et al.  Inferring Implicit Topical Interests on Twitter , 2016, ECIR.

[73]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[74]  Ted Pedersen Duluth : Measuring Degrees of Relational Similarity with the Gloss Vector Measure of Semantic Relatedness , 2012, SemEval@NAACL-HLT.

[75]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[76]  Iryna Gurevych,et al.  Thinking beyond the nouns - computing semantic relatedness across parts of speech , 2006 .

[77]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[78]  Milan Stankovic,et al.  Discovering the Dynamics of Terms' Semantic Relatedness through Twitter , 2011, #MSM.

[79]  Eneko Agirre,et al.  WikiWalk: Random walks on Wikipedia for Semantic Relatedness , 2009, Graph-based Methods for Natural Language Processing.

[80]  Wei Ding,et al.  A Fully Unsupervised Word Sense Disambiguation Method Using Dependency Knowledge , 2009, HLT-NAACL.

[81]  J. Euzenat,et al.  Ontology Matching , 2007, Springer Berlin Heidelberg.

[82]  Stan Szpakowicz,et al.  Roget's thesaurus and semantic similarity , 2012, RANLP.