论文信息 - Semantic association computation: a comprehensive survey

Semantic association computation: a comprehensive survey

Semantic association computation is the process of quantifying the strength of a semantic connection between two textual units, based on different types of semantic relations. Semantic association computation is a key component of various applications belonging to a multitude of fields, such as computational linguistics, cognitive psychology, information retrieval and artificial intelligence. The field of semantic association computation has been studied for decades. The aim of this paper is to present a comprehensive survey of various approaches for computing semantic associations, categorized according to their underlying sources of background knowledge. Existing surveys on semantic computation have focused on a specific aspect of semantic associations, such as utilizing distributional semantics in association computation or types of spatial models of semantic associations. However, this paper has put a multitude of computational aspects and factors in one picture. This makes the article worth reading for those researchers who want to start off in the field of semantic associations computation. This paper introduces the fundamental elements of the association computation process, evaluation methodologies and pervasiveness of semantic measures in a variety of fields, relying on natural language semantics. Along the way, there is a detailed discussion on the main categories of background knowledge sources, classified as formal and informal knowledge sources, and the underlying design models, such as spatial, combinatorial and network models, that are used in the association computation process. The paper classifies existing approaches of semantic association computation into two broad categories, based on their utilization of background knowledge sources: knowledge-rich approaches; and knowledge-lean approaches. Each category is divided further into sub-categories, according to the type of underlying knowledge sources and design models of semantic association. A comparative analysis of strengths and limitations of various approaches belonging to each research stream is also presented. The paper concludes the survey by analyzing the pivotal factors that affect the performance of semantic association measures.

[1] Simone Paolo Ponzetto,et al. BabelRelate! A Joint Multilingual Approach to Computing Semantic Relatedness , 2012, AAAI.

[2] Rakesh Kumar,et al. Lexical Co-Occurrence and Contextual Window-Based Approach with Semantic Similarity for Query Expansion , 2017, Int. J. Intell. Inf. Technol..

[3] Hinrich Schütze,et al. Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[4] Stephan Winter,et al. Similarity matching for integrating spatial information extracted from place descriptions , 2017, Int. J. Geogr. Inf. Sci..

[5] Ellen M. Voorhees,et al. Query expansion using lexical-semantic relations , 1994, SIGIR '94.

[6] Adam Tauman Kalai,et al. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings , 2016, NIPS.

[7] Frank Keller,et al. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL , 2014, EMNLP.

[8] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.

[9] Ian H. Witten,et al. Learning to link with wikipedia , 2008, CIKM '08.

[10] R. Mooney,et al. Impact of Similarity Measures on Web-page Clustering , 2000 .

[11] Péter Schönhofen,et al. Identifying Document Topics Using the Wikipedia Category Network , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).

[12] Esther Levin,et al. Evaluation of Utility of LSA for Word Sense Discrimination , 2006, HLT-NAACL.

[13] Xiaohua Hu,et al. Exploiting Wikipedia as external knowledge for document clustering , 2009, KDD.

[14] Rada Mihalcea,et al. A Method for Word Sense Disambiguation of Unrestricted Text , 1999, ACL.

[15] Ellen M. Voorhees,et al. Using WordNet to disambiguate word senses for text retrieval , 1993, SIGIR.

[16] Ellen Riloff,et al. Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , 2012, HLT-NAACL 2012.

[17] Xiaoying Gao,et al. Directional Context Helps: Guiding Semantic Relatedness Computation by Asymmetric Word Associations , 2013, WISE.

[18] Ziqi Zhang,et al. Recent advances in methods of lexical semantic relatedness – a survey , 2012, Natural Language Engineering.

[19] Hui Xiong,et al. Dynamic Word Embeddings for Evolving Semantic Discovery , 2017, WSDM.

[20] Philip Resnik,et al. Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[21] Wen-tau Yih,et al. Measuring Word Relatedness Using Heterogeneous Vector Space Models , 2012, HLT-NAACL.

[22] Peter D. Turney. Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[23] Xiaoying Gao,et al. Probabilistic Associations as a Proxy for Semantic Relatedness , 2014, WISE.

[24] Miles Osborne,et al. Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.

[25] Evgeniy Gabrilovich,et al. A word at a time: computing word relatedness using temporal semantic analysis , 2011, WWW.

[26] Kavitha Adhikesavan. An integrated approach for measuring semantic similarity between words and sentences using web search engine , 2015, Int. Arab J. Inf. Technol..

[27] Adam Kilgarriff,et al. Introduction to the Special Issue on the Web as Corpus , 2003, CL.

[28] Dawid Weiss,et al. A survey of Web clustering engines , 2009, CSUR.

[29] Yann LeCun,et al. Very Deep Convolutional Networks for Natural Language Processing , 2016, ArXiv.

[30] Stan Szpakowicz,et al. Roget's Thesaurus: a Lexical Resource to Treasure , 2012, ArXiv.

[31] Alistair Moffat,et al. Exploring the similarity space , 1998, SIGF.

[32] Ion Androutsopoulos,et al. A Survey of Paraphrasing and Textual Entailment Methods , 2009, J. Artif. Intell. Res..

[33] David J. Weir,et al. A General Framework for Distributional Similarity , 2003, EMNLP.

[34] Ruslan Salakhutdinov,et al. Knowledge-based Word Sense Disambiguation using Topic Models , 2018, AAAI.

[35] Samuel Fernando,et al. A Semantic Similarity Approach to Paraphrase Detection , 2008 .

[36] L. R. Dice. Measures of the Amount of Ecologic Association Between Species , 1945 .

[37] Ted Pedersen,et al. Using Measures of Semantic Relatedness for Word Sense Disambiguation , 2003, CICLing.

[38] David M. W. Powers,et al. Measuring Semantic Similarity in the Taxonomy of WordNet , 2005, ACSC.

[39] Rada Mihalcea,et al. Cross-lingual Semantic Relatedness Using Encyclopedic Knowledge , 2009, EMNLP.

[40] Philip Resnik,et al. Measuring Verb Similarity , 2000 .

[41] John B. Goodenough,et al. Contextual correlates of synonymy , 1965, CACM.

[42] Zuhair Bandar,et al. AWSS: An Algorithm for Measuring Arabic Word Semantic Similarity , 2013, 2013 IEEE International Conference on Systems, Man, and Cybernetics.

[43] Philippe Langlais,et al. Evaluating Variants of the Lesk Approach for Disambiguating Words , 2004, LREC.

[44] Jan Hauke,et al. Comparison of Values of Pearson's and Spearman's Correlation Coefficients on the Same Sets of Data , 2011 .

[45] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[46] Pinaki Bhaskar. Multi-Document Summarization using Automatic Key-Phrase Extraction , 2013, RANLP.

[47] Iryna Gurevych,et al. Automatically Creating Datasets for Measures of Semantic Relatedness , 2006, ACL 2006.

[48] G. Leech. 100 million words of English , 1993, English Today.

[49] Ian H. Witten,et al. An open-source toolkit for mining Wikipedia , 2013, Artif. Intell..

[50] Evgeniy Gabrilovich,et al. Overcoming the Brittleness Bottleneck using Wikipedia: Enhancing Text Categorization with Encyclopedic Knowledge , 2006, AAAI.

[51] Thomas Hofmann,et al. Probabilistic Latent Semantic Indexing , 1999, SIGIR Forum.

[52] Xiaoying Gao,et al. Harnessing Wikipedia Semantics for Computing Contextual Relatedness , 2012, PRICAI.

[53] Abdellah Yousfi,et al. Context's impact on the automatic spelling correction , 2017, Int. J. Artif. Intell. Soft Comput..

[54] Derek Lackaff,et al. An Analysis of Topical Coverage of Wikipedia , 2008, J. Comput. Mediat. Commun..

[55] Ted Pedersen,et al. Information Content Measures of Semantic Similarity Perform Better Without Sense-Tagged Text , 2010, NAACL.

[56] José M. F. Moura,et al. VisualWord2Vec (Vis-W2V): Learning Visually Grounded Word Embeddings Using Abstract Scenes , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57] Heeyoung Lee,et al. Stanford’s Multi-Pass Sieve Coreference Resolution System at the CoNLL-2011 Shared Task , 2011, CoNLL Shared Task.

[58] Iryna Gurevych,et al. Using the Structure of a Conceptual Network in Computing Semantic Relatedness , 2005, IJCNLP.

[59] Bob Rehder,et al. How Well Can Passage Meaning be Derived without Using Word Order? A Comparison of Latent Semantic Analysis and Humans , 1997 .

[60] Miles Osborne,et al. Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (HLT '10) , 2010 .

[61] David Sánchez,et al. Semantic similarity estimation in the biomedical domain: An ontology-based information-theoretic perspective , 2011, J. Biomed. Informatics.

[62] Michael N. Jones,et al. The semantic richness of abstract concepts , 2012, Front. Hum. Neurosci..

[63] Simone Paolo Ponzetto,et al. Knowledge Derived From Wikipedia For Computing Semantic Relatedness , 2007, J. Artif. Intell. Res..

[64] Abdelmajid Ben Hamadou,et al. Computing semantic relatedness using Wikipedia features , 2013, Knowl. Based Syst..

[65] R. Brussee,et al. Automatic Thesaurus Generation using Co-occurrence , 2008 .

[66] Regina Barzilay,et al. Using Lexical Chains for Text Summarization , 1997 .

[67] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[68] Ana M. García-Serrano,et al. HESML: A scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset , 2017, Inf. Syst..

[69] Navneet Kaur,et al. Opinion mining and sentiment analysis , 2016, 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom).

[70] Lubomir Stanchev,et al. Creating a Phrase Similarity Graph from Wikipedia , 2014, 2014 IEEE International Conference on Semantic Computing.

[71] Dan I. Moldovan,et al. Lexical Chains on WordNet and Extensions , 2013, FLAIRS Conference.

[72] Eneko Agirre,et al. WikiWalk: Random walks on Wikipedia for Semantic Relatedness , 2009, Graph-based Methods for Natural Language Processing.

[73] Hua Xu,et al. Chinese comments sentiment classification based on word2vec and SVMperf , 2015, Expert Syst. Appl..

[74] Chris Callison-Burch,et al. Improved Statistical Machine Translation Using Monolingually-Derived Paraphrases , 2009, EMNLP.

[75] Graeme Hirst,et al. Non-Classical Lexical Semantic Relations , 2004, Proceedings of the HLT-NAACL Workshop on Computational Lexical Semantics - CLS '04.

[76] Paolo Rosso,et al. INAOE_UPV-CORE: Extracting Word Associations from Document Corpora to estimate Semantic Textual Similarity , 2013, *SEM@NAACL-HLT.

[77] G. Miller,et al. Contextual correlates of semantic similarity , 1991 .

[78] Ming-Wei Chang,et al. Semantic Parsing via Staged Query Graph Generation: Question Answering with Knowledge Base , 2015, ACL.

[79] Petra Saskia Bayerl,et al. What Determines Inter-Coder Agreement in Manual Annotations? A Meta-Analytic Investigation , 2011, CL.

[80] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.

[81] Kavitha Chinniyan,et al. Semantic similarity based web document classification using support vector machine , 2017, Int. Arab J. Inf. Technol..

[82] Yufei Huang,et al. Clustering of Gene Expression Data Based on Shape Similarity , 2007, BIOCOMP.

[83] George Gaylord Simpson,et al. Mammals and the nature of continents , 1943 .

[84] Eneko Agirre,et al. A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches , 2009, NAACL.

[85] Philip S. Yu,et al. A new method to measure the semantic similarity of GO terms , 2007, Bioinform..

[86] Simone Paolo Ponzetto,et al. WikiRelate! Computing Semantic Relatedness Using Wikipedia , 2006, AAAI.

[87] Danushka Bollegala,et al. Compositional approaches for representing relations between words: A comparative study , 2017, Knowl. Based Syst..

[88] Koray Kavukcuoglu,et al. Learning word embeddings efficiently with noise-contrastive estimation , 2013, NIPS.

[89] P. Cassidy. An Investigation of the Semantic Relations in the Roget ’ s Thesaurus : Preliminary Results , 2010 .

[90] Evgeniy Gabrilovich,et al. Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[91] José Francisco Aldana Montes,et al. tESA: a distributional measure for calculating semantic relatedness , 2016, Journal of Biomedical Semantics.

[92] Jun Zhao,et al. How to Generate a Good Word Embedding , 2015, IEEE Intelligent Systems.

[93] Chris Brew,et al. Using the Wiktionary Graph Structure for Synonym Detection , 2009, PWNLP@IJCNLP.

[94] Lior Rokach,et al. Predict Demographic Information Using Word2vec on Spatial Trajectories , 2018, UMAP.

[95] Ido Dagan,et al. Automatic thesaurus construction for cross generation corpus , 2013, JOCCH.

[96] De Xu,et al. Concept vector for semantic similarity and relatedness based on WordNet structure , 2012, J. Syst. Softw..

[97] Wei-Ying Ma,et al. Building a web thesaurus from web link structure , 2003, SIGIR.

[98] David Sánchez,et al. An ontology-based measure to compute semantic similarity in biomedicine , 2011, J. Biomed. Informatics.

[99] Ian H. Witten,et al. An effective, low-cost measure of semantic relatedness obtained from Wikipedia links , 2008 .

[100] Chris H. Q. Ding,et al. Automatic topic identification using webpage clustering , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[101] Kenneth Ward Church,et al. Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[102] Xiaoying Gao,et al. CPRel: Semantic Relatedness Computation Using Wikipedia based Context Profiles , 2013, Res. Comput. Sci..

[103] Danushka Bollegala,et al. A Web Search Engine-Based Approach to Measure Semantic Similarity between Words , 2011, IEEE Transactions on Knowledge and Data Engineering.

[104] Thomas A. Schreiber,et al. The University of South Florida free association, rhyme, and word fragment norms , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[105] Michael J. Witbrock,et al. An Introduction to the Syntax and Content of Cyc , 2006, AAAI Spring Symposium: Formalizing and Compiling Background Knowledge and Its Applications to Knowledge Representation and Question Answering.

[106] John A. Keane,et al. Using Web-Search Results to Measure Word-Group Similarity , 2008, COLING.

[107] J. R. Firth,et al. A Synopsis of Linguistic Theory, 1930-1955 , 1957 .

[108] Claudio Carpineto,et al. A Survey of Automatic Query Expansion in Information Retrieval , 2012, CSUR.

[109] H. Schütze,et al. Dimensions of meaning , 1992, Supercomputing '92.

[110] Rada Mihalcea,et al. Semantic Relatedness Using Salient Semantic Analysis , 2011, AAAI.

[111] Anna-Lan Huang,et al. Similarity Measures for Text Document Clustering , 2008 .

[112] Ehud Rivlin,et al. Placing search in context: the concept revisited , 2002, TOIS.

[113] Ian H. Witten,et al. Mining Domain-Specific Thesauri from Wikipedia: A Case Study , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).

[114] Lukás Burget,et al. Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[115] Richard A. Harshman,et al. Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[116] Ian H. Witten,et al. Topic indexing with Wikipedia , 2008 .

[117] Mitsuru Ishizuka,et al. Graph-based Word Clustering using a Web Search Engine , 2006, EMNLP.

[118] Douglas B. Lenat,et al. Mapping Ontologies into Cyc , 2002 .

[119] Peter D. Turney. Learning Algorithms for Keyphrase Extraction , 2000, Information Retrieval.

[120] Pradeep Ravikumar,et al. Word Mover’s Embedding: From Word2Vec to Document Embedding , 2018, EMNLP.

[121] Douglas L. T. Rohde,et al. An Improved Model of Semantic Similarity Based on Lexical Co-Occurrence , 2005 .

[122] Sylvie Ranwez,et al. Semantic Measures for the Comparison of Units of Language, Concepts or Entities from Text and Knowledge Base Analysis , 2013, ArXiv.

[123] Graeme Hirst,et al. Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures , 2004 .

[124] Chris Clifton,et al. TopCat: data mining for topic identification in a text corpus , 1999, IEEE Transactions on Knowledge and Data Engineering.

[125] Frank Keller,et al. Using the Web to Overcome Data Sparseness , 2002, EMNLP.

[126] Lin-Shan Lee,et al. Phonetic-and-Semantic Embedding of Spoken words with Applications in Spoken Content Retrieval , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).