Semantic association computation: a comprehensive survey

Semantic association computation is the process of quantifying the strength of a semantic connection between two textual units, based on different types of semantic relations. Semantic association computation is a key component of various applications belonging to a multitude of fields, such as computational linguistics, cognitive psychology, information retrieval and artificial intelligence. The field of semantic association computation has been studied for decades. The aim of this paper is to present a comprehensive survey of various approaches for computing semantic associations, categorized according to their underlying sources of background knowledge. Existing surveys on semantic computation have focused on a specific aspect of semantic associations, such as utilizing distributional semantics in association computation or types of spatial models of semantic associations. However, this paper has put a multitude of computational aspects and factors in one picture. This makes the article worth reading for those researchers who want to start off in the field of semantic associations computation. This paper introduces the fundamental elements of the association computation process, evaluation methodologies and pervasiveness of semantic measures in a variety of fields, relying on natural language semantics. Along the way, there is a detailed discussion on the main categories of background knowledge sources, classified as formal and informal knowledge sources, and the underlying design models, such as spatial, combinatorial and network models, that are used in the association computation process. The paper classifies existing approaches of semantic association computation into two broad categories, based on their utilization of background knowledge sources: knowledge-rich approaches; and knowledge-lean approaches. Each category is divided further into sub-categories, according to the type of underlying knowledge sources and design models of semantic association. A comparative analysis of strengths and limitations of various approaches belonging to each research stream is also presented. The paper concludes the survey by analyzing the pivotal factors that affect the performance of semantic association measures.

[1]  Simone Paolo Ponzetto,et al.  BabelRelate! A Joint Multilingual Approach to Computing Semantic Relatedness , 2012, AAAI.

[2]  Rakesh Kumar,et al.  Lexical Co-Occurrence and Contextual Window-Based Approach with Semantic Similarity for Query Expansion , 2017, Int. J. Intell. Inf. Technol..

[3]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[4]  Stephan Winter,et al.  Similarity matching for integrating spatial information extracted from place descriptions , 2017, Int. J. Geogr. Inf. Sci..

[5]  Ellen M. Voorhees,et al.  Query expansion using lexical-semantic relations , 1994, SIGIR '94.

[6]  Adam Tauman Kalai,et al.  Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings , 2016, NIPS.

[7]  Frank Keller,et al.  Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL , 2014, EMNLP.

[8]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[9]  Ian H. Witten,et al.  Learning to link with wikipedia , 2008, CIKM '08.

[10]  R. Mooney,et al.  Impact of Similarity Measures on Web-page Clustering , 2000 .

[11]  Péter Schönhofen,et al.  Identifying Document Topics Using the Wikipedia Category Network , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).

[12]  Esther Levin,et al.  Evaluation of Utility of LSA for Word Sense Discrimination , 2006, HLT-NAACL.

[13]  Xiaohua Hu,et al.  Exploiting Wikipedia as external knowledge for document clustering , 2009, KDD.

[14]  Rada Mihalcea,et al.  A Method for Word Sense Disambiguation of Unrestricted Text , 1999, ACL.

[15]  Ellen M. Voorhees,et al.  Using WordNet to disambiguate word senses for text retrieval , 1993, SIGIR.

[16]  Ellen Riloff,et al.  Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , 2012, HLT-NAACL 2012.

[17]  Xiaoying Gao,et al.  Directional Context Helps: Guiding Semantic Relatedness Computation by Asymmetric Word Associations , 2013, WISE.

[18]  Ziqi Zhang,et al.  Recent advances in methods of lexical semantic relatedness – a survey , 2012, Natural Language Engineering.

[19]  Hui Xiong,et al.  Dynamic Word Embeddings for Evolving Semantic Discovery , 2017, WSDM.

[20]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[21]  Wen-tau Yih,et al.  Measuring Word Relatedness Using Heterogeneous Vector Space Models , 2012, HLT-NAACL.

[22]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[23]  Xiaoying Gao,et al.  Probabilistic Associations as a Proxy for Semantic Relatedness , 2014, WISE.

[24]  Miles Osborne,et al.  Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.

[25]  Evgeniy Gabrilovich,et al.  A word at a time: computing word relatedness using temporal semantic analysis , 2011, WWW.

[26]  Kavitha Adhikesavan An integrated approach for measuring semantic similarity between words and sentences using web search engine , 2015, Int. Arab J. Inf. Technol..

[27]  Adam Kilgarriff,et al.  Introduction to the Special Issue on the Web as Corpus , 2003, CL.

[28]  Dawid Weiss,et al.  A survey of Web clustering engines , 2009, CSUR.

[29]  Yann LeCun,et al.  Very Deep Convolutional Networks for Natural Language Processing , 2016, ArXiv.

[30]  Stan Szpakowicz,et al.  Roget's Thesaurus: a Lexical Resource to Treasure , 2012, ArXiv.

[31]  Alistair Moffat,et al.  Exploring the similarity space , 1998, SIGF.

[32]  Ion Androutsopoulos,et al.  A Survey of Paraphrasing and Textual Entailment Methods , 2009, J. Artif. Intell. Res..

[33]  David J. Weir,et al.  A General Framework for Distributional Similarity , 2003, EMNLP.

[34]  Ruslan Salakhutdinov,et al.  Knowledge-based Word Sense Disambiguation using Topic Models , 2018, AAAI.

[35]  Samuel Fernando,et al.  A Semantic Similarity Approach to Paraphrase Detection , 2008 .

[36]  L. R. Dice Measures of the Amount of Ecologic Association Between Species , 1945 .

[37]  Ted Pedersen,et al.  Using Measures of Semantic Relatedness for Word Sense Disambiguation , 2003, CICLing.

[38]  David M. W. Powers,et al.  Measuring Semantic Similarity in the Taxonomy of WordNet , 2005, ACSC.

[39]  Rada Mihalcea,et al.  Cross-lingual Semantic Relatedness Using Encyclopedic Knowledge , 2009, EMNLP.

[40]  Philip Resnik,et al.  Measuring Verb Similarity , 2000 .

[41]  John B. Goodenough,et al.  Contextual correlates of synonymy , 1965, CACM.

[42]  Zuhair Bandar,et al.  AWSS: An Algorithm for Measuring Arabic Word Semantic Similarity , 2013, 2013 IEEE International Conference on Systems, Man, and Cybernetics.

[43]  Philippe Langlais,et al.  Evaluating Variants of the Lesk Approach for Disambiguating Words , 2004, LREC.

[44]  Jan Hauke,et al.  Comparison of Values of Pearson's and Spearman's Correlation Coefficients on the Same Sets of Data , 2011 .

[45]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[46]  Pinaki Bhaskar Multi-Document Summarization using Automatic Key-Phrase Extraction , 2013, RANLP.

[47]  Iryna Gurevych,et al.  Automatically Creating Datasets for Measures of Semantic Relatedness , 2006, ACL 2006.

[48]  G. Leech 100 million words of English , 1993, English Today.

[49]  Ian H. Witten,et al.  An open-source toolkit for mining Wikipedia , 2013, Artif. Intell..

[50]  Evgeniy Gabrilovich,et al.  Overcoming the Brittleness Bottleneck using Wikipedia: Enhancing Text Categorization with Encyclopedic Knowledge , 2006, AAAI.

[51]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Indexing , 1999, SIGIR Forum.

[52]  Xiaoying Gao,et al.  Harnessing Wikipedia Semantics for Computing Contextual Relatedness , 2012, PRICAI.

[53]  Abdellah Yousfi,et al.  Context's impact on the automatic spelling correction , 2017, Int. J. Artif. Intell. Soft Comput..

[54]  Derek Lackaff,et al.  An Analysis of Topical Coverage of Wikipedia , 2008, J. Comput. Mediat. Commun..

[55]  Ted Pedersen,et al.  Information Content Measures of Semantic Similarity Perform Better Without Sense-Tagged Text , 2010, NAACL.

[56]  José M. F. Moura,et al.  VisualWord2Vec (Vis-W2V): Learning Visually Grounded Word Embeddings Using Abstract Scenes , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Heeyoung Lee,et al.  Stanford’s Multi-Pass Sieve Coreference Resolution System at the CoNLL-2011 Shared Task , 2011, CoNLL Shared Task.

[58]  Iryna Gurevych,et al.  Using the Structure of a Conceptual Network in Computing Semantic Relatedness , 2005, IJCNLP.

[59]  Bob Rehder,et al.  How Well Can Passage Meaning be Derived without Using Word Order? A Comparison of Latent Semantic Analysis and Humans , 1997 .

[60]  Miles Osborne,et al.  Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (HLT '10) , 2010 .

[61]  David Sánchez,et al.  Semantic similarity estimation in the biomedical domain: An ontology-based information-theoretic perspective , 2011, J. Biomed. Informatics.

[62]  Michael N. Jones,et al.  The semantic richness of abstract concepts , 2012, Front. Hum. Neurosci..

[63]  Simone Paolo Ponzetto,et al.  Knowledge Derived From Wikipedia For Computing Semantic Relatedness , 2007, J. Artif. Intell. Res..

[64]  Abdelmajid Ben Hamadou,et al.  Computing semantic relatedness using Wikipedia features , 2013, Knowl. Based Syst..

[65]  R. Brussee,et al.  Automatic Thesaurus Generation using Co-occurrence , 2008 .

[66]  Regina Barzilay,et al.  Using Lexical Chains for Text Summarization , 1997 .

[67]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[68]  Ana M. García-Serrano,et al.  HESML: A scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset , 2017, Inf. Syst..

[69]  Navneet Kaur,et al.  Opinion mining and sentiment analysis , 2016, 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom).

[70]  Lubomir Stanchev,et al.  Creating a Phrase Similarity Graph from Wikipedia , 2014, 2014 IEEE International Conference on Semantic Computing.

[71]  Dan I. Moldovan,et al.  Lexical Chains on WordNet and Extensions , 2013, FLAIRS Conference.

[72]  Eneko Agirre,et al.  WikiWalk: Random walks on Wikipedia for Semantic Relatedness , 2009, Graph-based Methods for Natural Language Processing.

[73]  Hua Xu,et al.  Chinese comments sentiment classification based on word2vec and SVMperf , 2015, Expert Syst. Appl..

[74]  Chris Callison-Burch,et al.  Improved Statistical Machine Translation Using Monolingually-Derived Paraphrases , 2009, EMNLP.

[75]  Graeme Hirst,et al.  Non-Classical Lexical Semantic Relations , 2004, Proceedings of the HLT-NAACL Workshop on Computational Lexical Semantics - CLS '04.

[76]  Paolo Rosso,et al.  INAOE_UPV-CORE: Extracting Word Associations from Document Corpora to estimate Semantic Textual Similarity , 2013, *SEM@NAACL-HLT.

[77]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[78]  Ming-Wei Chang,et al.  Semantic Parsing via Staged Query Graph Generation: Question Answering with Knowledge Base , 2015, ACL.

[79]  Petra Saskia Bayerl,et al.  What Determines Inter-Coder Agreement in Manual Annotations? A Meta-Analytic Investigation , 2011, CL.

[80]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[81]  Kavitha Chinniyan,et al.  Semantic similarity based web document classification using support vector machine , 2017, Int. Arab J. Inf. Technol..

[82]  Yufei Huang,et al.  Clustering of Gene Expression Data Based on Shape Similarity , 2007, BIOCOMP.

[83]  George Gaylord Simpson,et al.  Mammals and the nature of continents , 1943 .

[84]  Eneko Agirre,et al.  A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches , 2009, NAACL.

[85]  Philip S. Yu,et al.  A new method to measure the semantic similarity of GO terms , 2007, Bioinform..

[86]  Simone Paolo Ponzetto,et al.  WikiRelate! Computing Semantic Relatedness Using Wikipedia , 2006, AAAI.

[87]  Danushka Bollegala,et al.  Compositional approaches for representing relations between words: A comparative study , 2017, Knowl. Based Syst..

[88]  Koray Kavukcuoglu,et al.  Learning word embeddings efficiently with noise-contrastive estimation , 2013, NIPS.

[89]  P. Cassidy An Investigation of the Semantic Relations in the Roget ’ s Thesaurus : Preliminary Results , 2010 .

[90]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[91]  José Francisco Aldana Montes,et al.  tESA: a distributional measure for calculating semantic relatedness , 2016, Journal of Biomedical Semantics.

[92]  Jun Zhao,et al.  How to Generate a Good Word Embedding , 2015, IEEE Intelligent Systems.

[93]  Chris Brew,et al.  Using the Wiktionary Graph Structure for Synonym Detection , 2009, PWNLP@IJCNLP.

[94]  Lior Rokach,et al.  Predict Demographic Information Using Word2vec on Spatial Trajectories , 2018, UMAP.

[95]  Ido Dagan,et al.  Automatic thesaurus construction for cross generation corpus , 2013, JOCCH.

[96]  De Xu,et al.  Concept vector for semantic similarity and relatedness based on WordNet structure , 2012, J. Syst. Softw..

[97]  Wei-Ying Ma,et al.  Building a web thesaurus from web link structure , 2003, SIGIR.

[98]  David Sánchez,et al.  An ontology-based measure to compute semantic similarity in biomedicine , 2011, J. Biomed. Informatics.

[99]  Ian H. Witten,et al.  An effective, low-cost measure of semantic relatedness obtained from Wikipedia links , 2008 .

[100]  Chris H. Q. Ding,et al.  Automatic topic identification using webpage clustering , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[101]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[102]  Xiaoying Gao,et al.  CPRel: Semantic Relatedness Computation Using Wikipedia based Context Profiles , 2013, Res. Comput. Sci..

[103]  Danushka Bollegala,et al.  A Web Search Engine-Based Approach to Measure Semantic Similarity between Words , 2011, IEEE Transactions on Knowledge and Data Engineering.

[104]  Thomas A. Schreiber,et al.  The University of South Florida free association, rhyme, and word fragment norms , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[105]  Michael J. Witbrock,et al.  An Introduction to the Syntax and Content of Cyc , 2006, AAAI Spring Symposium: Formalizing and Compiling Background Knowledge and Its Applications to Knowledge Representation and Question Answering.

[106]  John A. Keane,et al.  Using Web-Search Results to Measure Word-Group Similarity , 2008, COLING.

[107]  J. R. Firth,et al.  A Synopsis of Linguistic Theory, 1930-1955 , 1957 .

[108]  Claudio Carpineto,et al.  A Survey of Automatic Query Expansion in Information Retrieval , 2012, CSUR.

[109]  H. Schütze,et al.  Dimensions of meaning , 1992, Supercomputing '92.

[110]  Rada Mihalcea,et al.  Semantic Relatedness Using Salient Semantic Analysis , 2011, AAAI.

[111]  Anna-Lan Huang,et al.  Similarity Measures for Text Document Clustering , 2008 .

[112]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[113]  Ian H. Witten,et al.  Mining Domain-Specific Thesauri from Wikipedia: A Case Study , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).

[114]  Lukás Burget,et al.  Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[115]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[116]  Ian H. Witten,et al.  Topic indexing with Wikipedia , 2008 .

[117]  Mitsuru Ishizuka,et al.  Graph-based Word Clustering using a Web Search Engine , 2006, EMNLP.

[118]  Douglas B. Lenat,et al.  Mapping Ontologies into Cyc , 2002 .

[119]  Peter D. Turney Learning Algorithms for Keyphrase Extraction , 2000, Information Retrieval.

[120]  Pradeep Ravikumar,et al.  Word Mover’s Embedding: From Word2Vec to Document Embedding , 2018, EMNLP.

[121]  Douglas L. T. Rohde,et al.  An Improved Model of Semantic Similarity Based on Lexical Co-Occurrence , 2005 .

[122]  Sylvie Ranwez,et al.  Semantic Measures for the Comparison of Units of Language, Concepts or Entities from Text and Knowledge Base Analysis , 2013, ArXiv.

[123]  Graeme Hirst,et al.  Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures , 2004 .

[124]  Chris Clifton,et al.  TopCat: data mining for topic identification in a text corpus , 1999, IEEE Transactions on Knowledge and Data Engineering.

[125]  Frank Keller,et al.  Using the Web to Overcome Data Sparseness , 2002, EMNLP.

[126]  Lin-Shan Lee,et al.  Phonetic-and-Semantic Embedding of Spoken words with Applications in Spoken Content Retrieval , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).

[127]  Douglas B. Lenat,et al.  CYC: a large-scale investment in knowledge infrastructure , 1995, CACM.

[128]  Bridget T. McInnes,et al.  Vector representations of multi-word terms for semantic relatedness , 2018, J. Biomed. Informatics.

[129]  Francis C. Fernández-Reyes,et al.  A Prospect-Guided global query expansion strategy using word embeddings , 2018, Inf. Process. Manag..

[130]  George W. Davidson,et al.  Roget's Thesaurus of English Words and Phrases , 1982 .

[131]  Daniel S. Weld,et al.  Open Information Extraction Using Wikipedia , 2010, ACL.

[132]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[133]  Yong Shi,et al.  Build a Tourism-Specific Sentiment Lexicon Via Word2vec , 2018 .

[134]  Martin Chodorow,et al.  Combining local context and wordnet similarity for word sense identification , 1998 .

[135]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[136]  Michael Gertz,et al.  Temporal Information Retrieval: Challenges and Opportunities , 2011, TWAW.

[137]  Leonhard Hennig,et al.  Topic-based Multi-Document Summarization with Probabilistic Latent Semantic Analysis , 2009, RANLP.

[138]  Eneko Agirre,et al.  Word sensedisambiguation using conceptual distance , 1996, COLING 1996.

[139]  Evgeniy Gabrilovich,et al.  Large-scale learning of word relatedness with constraints , 2012, KDD.

[140]  Mourad Oussalah,et al.  On Web Based Sentence Similarity for Paraphrasing Detection , 2017, KDIR.

[141]  Andrei Popescu-Belis,et al.  Computing text semantic relatedness using the contents and links of a hypertext encyclopedia , 2013, Artif. Intell..

[142]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[143]  Vincent Ng,et al.  Shallow Semantics for Coreference Resolution , 2007, IJCAI.

[144]  Alexander F. Gelbukh,et al.  Synonymous Paraphrasing Using WordNet and Internet , 2004, NLDB.

[145]  David Vandyke,et al.  Counter-fitting Word Vectors to Linguistic Constraints , 2016, NAACL.

[146]  Elia Bruni,et al.  Multimodal Distributional Semantics , 2014, J. Artif. Intell. Res..

[147]  P. Jaccard THE DISTRIBUTION OF THE FLORA IN THE ALPINE ZONE.1 , 1912 .

[148]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[149]  Felix Hill,et al.  SimVerb-3500: A Large-Scale Evaluation Set of Verb Similarity , 2016, EMNLP.

[150]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[151]  Eric P. Xing,et al.  Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2014, ACL 2014.

[152]  Diana Inkpen,et al.  Second Order Co-occurrence PMI for Determining the Semantic Similarity of Words , 2006, LREC.

[153]  Maheshwar,et al.  A Modification to Graph Based Approach for Extraction Based Automatic Text Summarization , 2018 .

[154]  Hsin-Hsi Chen,et al.  Novel Association Measures Using Web Search with Double Checking , 2006, ACL.

[155]  Patrick Schone,et al.  Mining Wiki Resources for Multilingual Named Entity Recognition , 2008, ACL.

[156]  Vincent Ng,et al.  Automatic Keyphrase Extraction: A Survey of the State of the Art , 2014, ACL.

[157]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[158]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[159]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[160]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[161]  Diana Inkpen,et al.  Real-Word Spelling Correction using Google Web 1T 3-grams , 2009, EMNLP.

[162]  Simone Paolo Ponzetto,et al.  BabelNet: Building a Very Large Multilingual Semantic Network , 2010, ACL.

[163]  Alexandros Potamianos,et al.  Similarity computation using semantic networks created from web-harvested data , 2013, Natural Language Engineering.

[164]  Philip Koehn,et al.  Statistical Machine Translation , 2010, EAMT.

[165]  Wei Zhang,et al.  Integrating Semantic Relatedness and Words' Intrinsic Features for Keyword Extraction , 2013, IJCAI.

[166]  Andreas Hotho,et al.  Extracting Semantics from Random Walks on Wikipedia: Comparing Learning and Counting Methods , 2021, Wiki@ICWSM.

[167]  Jinho D. Choi,et al.  Analysis of Wikipedia-based Corpora for Question Answering , 2018, ArXiv.

[168]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[169]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[170]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[171]  Christopher D. Manning,et al.  Better Word Representations with Recursive Neural Networks for Morphology , 2013, CoNLL.

[172]  Magnus Sahlgren,et al.  Vector-based semantic analysis: representing word meanings based on random labels , 2001 .

[173]  J. Bullinaria,et al.  Extracting semantic representations from word co-occurrence statistics: A computational study , 2007, Behavior research methods.

[174]  Ted Pedersen,et al.  Evaluating measures of semantic similarity and relatedness to disambiguate terms in biomedical text , 2013, J. Biomed. Informatics.

[175]  B. Hammond Ontology , 2004, Lawrence Booth’s Book of Visions.

[176]  Wang Ling,et al.  Two/Too Simple Adaptations of Word2Vec for Syntax Problems , 2015, NAACL.

[177]  Thorsten Brants,et al.  Topic-based document segmentation with probabilistic latent semantic analysis , 2002, CIKM '02.

[178]  Penny M. Pexman,et al.  Introduction to the research topic meaning in mind: semantic richness effects in language processing , 2013, Front. Hum. Neurosci..

[179]  Gerhard Weikum,et al.  YAGO: A Large Ontology from Wikipedia and WordNet , 2008, J. Web Semant..

[180]  Ladislav Lenc,et al.  On the effects of using word2vec representations in neural networks for dialogue act recognition , 2020, Comput. Speech Lang..

[181]  Mehran Sahami,et al.  A web-based kernel function for measuring the similarity of short text snippets , 2006, WWW '06.

[182]  Amit P. Sheth,et al.  Semantic Association Identification and Knowledge Discovery for National Security Applications , 2005, J. Database Manag..

[183]  Derek G. Bridge,et al.  Defining and Combining Symmetric and Asymmetric Similarity Measures , 1998, EWCBR.

[184]  Max Mühlhäuser,et al.  Comparing Wikipedia and German Wordnet by Evaluating Semantic Relatedness on Multiple Datasets , 2007, NAACL.

[185]  Patrick F. Reidy An Introduction to Latent Semantic Analysis , 2009 .

[186]  Michael Gertz,et al.  Identification of top relevant temporal expressions in documents , 2012, TempWeb '12.

[187]  Chafik Aloulou,et al.  Word2vec for Arabic Word Sense Disambiguation , 2018, NLDB.

[188]  Kevyn Collins-Thompson,et al.  Query expansion using random walk models , 2005, CIKM '05.

[189]  ChengXiang Zhai,et al.  Statistical Language Models for Information Retrieval: A Critical Review , 2008, Found. Trends Inf. Retr..

[190]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[191]  Omer Levy,et al.  Linguistic Regularities in Sparse and Explicit Word Representations , 2014, CoNLL.

[192]  Zellig S. Harris,et al.  Mathematical structures of language , 1968, Interscience tracts in pure and applied mathematics.

[193]  Rada Mihalcea,et al.  Topic Identification Using Wikipedia Graph Centrality , 2009, NAACL.

[194]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[195]  Vasudeva Varma,et al.  Extracting semantic knowledge from Wikipedia category names , 2013, AKBC '13.

[196]  Eneko Agirre,et al.  Word Sense Disambiguation using Conceptual Density , 1996, COLING.

[197]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[198]  Xiaoying Gao,et al.  A Hybrid Model for Learning Semantic Relatedness Using Wikipedia-Based Features , 2014, WISE.

[199]  Graeme Hirst,et al.  Distributional Measures of Semantic Distance: A Survey , 2012, ArXiv.

[200]  Ian H. Witten,et al.  Clustering Documents Using a Wikipedia-Based Concept Representation , 2009, PAKDD.

[201]  Rada Mihalcea,et al.  Wikify!: linking documents to encyclopedic knowledge , 2007, CIKM '07.

[202]  Iryna Gurevych,et al.  Wisdom of crowds versus wisdom of linguists – measuring the semantic relatedness of words , 2009, Natural Language Engineering.

[203]  Jean-François Delpech Unsupervised word sense disambiguation in dynamic semantic spaces , 2018, ArXiv.

[204]  Fernando Gomez,et al.  A New Set of Norms for Semantic Relatedness Measures , 2013, ACL.

[205]  Felix Hill,et al.  SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation , 2014, CL.

[206]  Lin-Shan Lee,et al.  Segmental Audio Word2Vec: Representing Utterances as Sequences of Vectors with Applications in Spoken Term Detection , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[207]  Derrick Higgins Which Statistics Reflect Semantics? Rethinking Synonymy and Word Similarity , 2005 .

[208]  Cornelia Caragea,et al.  A New Scheme for Scoring Phrases in Unsupervised Keyphrase Extraction , 2017, ECIR.

[209]  Graeme Hirst,et al.  Correcting real-word spelling errors by restoring lexical cohesion , 2005, Natural Language Engineering.

[210]  Ronan Collobert,et al.  Word Embeddings through Hellinger PCA , 2013, EACL.

[211]  Iryna Gurevych,et al.  Using Wiktionary for Computing Semantic Relatedness , 2008, AAAI.

[212]  Euripides G. M. Petrakis,et al.  Semantic similarity methods in wordNet and their application to information retrieval on the web , 2005, WIDM '05.

[213]  Christopher D. Manning,et al.  Random Walks for Text Semantic Similarity , 2009, Graph-based Methods for Natural Language Processing.

[214]  James J. Cimino,et al.  Towards the development of a conceptual distance metric for the UMLS , 2004, J. Biomed. Informatics.

[215]  Hisham Al-Mubaid,et al.  Measuring Semantic Similarity Between Biomedical Concepts Within Multiple Ontologies , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[216]  Jie Tang,et al.  Multi-topic Based Query-Oriented Summarization , 2009, SDM.

[217]  Hugo Caselles-Dupré,et al.  Word2vec applied to recommendation: hyperparameters matter , 2018, RecSys.

[218]  Graeme Hirst,et al.  Lexical chains as representations of context for the detection and correction of malapropisms , 1995 .

[219]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[220]  Stefan Frazier Meaningful texts: the extraction of semantic information from monolingual and multilingual corpora , 2009 .

[221]  Somnath Banerjee,et al.  Clustering short texts using wikipedia , 2007, SIGIR.

[222]  Gabriella Pasi,et al.  Credibility in social media: opinions, news, and health information—a survey , 2017, WIREs Data Mining Knowl. Discov..

[223]  Reinhard Rapp,et al.  Computation of Word Associations Based on Co-occurrences of Words in Large Corpora , 1993, VLC@ACL.

[224]  Hugo Liu,et al.  Commonsense Reasoning in and Over Natural Language , 2004, KES.

[225]  Xihong Wu,et al.  Text Segmentation with LDA-Based Fisher Kernel , 2008, ACL.

[226]  Sanda M. Harabagiu,et al.  Question Answering Based on Semantic Structures , 2004, COLING.

[227]  Ted Pedersen,et al.  Extended Gloss Overlaps as a Measure of Semantic Relatedness , 2003, IJCAI.

[228]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[229]  van Gerardus Noord,et al.  Special issue: finite state methods in natural language processing , 2003 .

[230]  Carlo Strapparava,et al.  Corpus-based and Knowledge-based Measures of Text Semantic Similarity , 2006, AAAI.

[231]  Ted Pedersen,et al.  Measures of semantic similarity and relatedness in the biomedical domain , 2007, J. Biomed. Informatics.

[232]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[233]  Eduardo Mena,et al.  Web-Based Measure of Semantic Relatedness , 2008, WISE.

[234]  Curt Burgess,et al.  Producing high-dimensional semantic spaces from lexical co-occurrence , 1996 .

[235]  Michael J. Witbrock,et al.  Searching for Common Sense: Populating Cyc™ from the Web , 2005, AAAI.