Measuring Semantic Relatedness with Knowledge Association Network

Measuring semantic relatedness between two words is a fundamental task for many applications in both databases and natural language processing domains. Conventional methods mainly utilize the latent semantic information hidden in lexical databases (WordNet) or text corpus (Wikipedia). They have made great achievements based on the distance computation in lexical tree or co-occurrence principle in Wikipedia. However these methods suffer from low coverage and low precision because (1) lexical database contains abundant lexical information but lacks semantic information; (2) in Wikipedia, two related words (e.g. synonyms) may not appear in a window size or a sentence, and unrelated ones may be mentioned together by chance. To compute semantic relatedness more accurately, some other approaches have made great efforts based on free association network and achieved a significant improvement on relatedness measurement. Nevertheless, they need complex preprocessing in Wikipedia. Besides, the fixed score functions they adopt cause the lack of flexibility and expressiveness of model. In this paper, we leverage DBPedia and Wikipedia to construct a Knowledge Association Network (KAN) which avoids the information extraction of Wikipedia. We propose a flexible and expressive model to represent entities behind the words, in which attribute and topological structure information of entities are embedded in vector space simultaneously. The experiment results based on standard datasets show the better effectiveness of our model compared to previous models.

[1]  Martin Ester,et al.  Detecting Singleton Review Spammers Using Semantic Similarity , 2015, WWW.

[2]  Simone Paolo Ponzetto,et al.  WikiRelate! Computing Semantic Relatedness Using Wikipedia , 2006, AAAI.

[3]  Michael Pucher WordNet-based Semantic Relatedness Measures in Automatic Speech Recognition for Meetings , 2007, ACL.

[4]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[5]  Beng Chin Ooi,et al.  A hybrid machine-crowdsourcing system for matching web tables , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[6]  Rada Mihalcea,et al.  Semantic Relatedness Using Salient Semantic Analysis , 2011, AAAI.

[7]  Jason Weston,et al.  StarSpace: Embed All The Things! , 2017, AAAI.

[8]  Ganggao Zhu,et al.  Computing Semantic Similarity of Concepts in Knowledge Graphs , 2017, IEEE Transactions on Knowledge and Data Engineering.

[9]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[10]  Iryna Gurevych,et al.  Using Wiktionary for Computing Semantic Relatedness , 2008, AAAI.

[11]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[12]  Xianpei Han,et al.  Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation , 2010, ACL.

[13]  Xiaoyong Du,et al.  Cost-Effective Data Annotation using Game-Based Crowdsourcing , 2018, Proc. VLDB Endow..

[14]  Rada Mihalcea,et al.  Measuring the semantic relatedness between words and images , 2011, IWCS.

[15]  Eneko Agirre,et al.  A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches , 2009, NAACL.

[16]  Neal Lewis,et al.  Semantic Lexicon Induction from Twitter with Pattern Relatedness and Flexible Term Length , 2015, AAAI.

[17]  Eneko Agirre,et al.  WikiWalk: Random walks on Wikipedia for Semantic Relatedness , 2009, Graph-based Methods for Natural Language Processing.

[18]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[19]  Ignacio Iacobacci,et al.  SensEmbed: Learning Sense Embeddings for Word and Relational Similarity , 2015, ACL.

[20]  Zhaohui Wu,et al.  Sense-Aaware Semantic Analysis: A Multi-Prototype Word Representation Model Using Wikipedia , 2015, AAAI.

[21]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[22]  Xiaolong Gong,et al.  HAN: Hierarchical Association Network for Computing Semantic Relatedness , 2018, AAAI.

[23]  Ian H. Witten,et al.  An effective, low-cost measure of semantic relatedness obtained from Wikipedia links , 2008 .

[24]  Giuseppe Pirrò,et al.  REWOrD: Semantic Relatedness in the Web of Data , 2012, AAAI.

[25]  Wei Zhang,et al.  Integrating Semantic Relatedness and Words' Intrinsic Features for Keyword Extraction , 2013, IJCAI.

[26]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[27]  Seung-won Hwang,et al.  An Association Network for Computing Semantic Relatedness , 2015, AAAI.

[28]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.