An Efficient Approach for Measuring Semantic Similarity Combining WordNet and Wikipedia

The measurement of semantic similarity between concepts is an important research topic in natural language processing. In the past, several approaches for measuring the semantic similarity between concepts have been proposed based on WordNet or Wikipedia. However, improvements in the measurement accuracy of most methods have led to a dramatic increase in time complexity, and the existing methods do not effectively integrate WordNet and Wikipedia. In this paper, we focus on designing an efficient semantic similarity method based on WordNet and Wikipedia. To improve the accuracy of WordNet edge-based measures, we propose an edge weight model for combining edge and density information, which assigns a weight to each edge adaptively based on the number of direct hyponyms of the subsumer. Second, to improve the computational efficiencies of the existing Wikipedia link vector-based measures, we propose a new Wikipedia link feature-based semantic similarity method that converts Wikipedia links into semantic knowledge and replaces the TF-IDF statistical weight model in the existing measures. In addition, we propose two new word disambiguation strategies to further improve the accuracy of Wikipedia link-based measures. Finally, to fully exploit the advantages of WordNet and Wikipedia, we propose two new aggregation schemas for combining WordNet “is-a” semantics and Wikipedia link semantics to replace the current aggregation schemas that combine WordNet “is-a” semantics with category semantics in Wikipedia. The experimental results show that our aggregation models are outstanding in terms of accuracy, efficiency and word coverage compared to state-of-the-art similarity measures.

[1]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[2]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[3]  Xiao-Ying Liu,et al.  Measuring Semantic Similarity in Wordnet , 2007, 2007 International Conference on Machine Learning and Cybernetics.

[4]  Euripides G. M. Petrakis,et al.  X-Similarity: Computing Semantic Similarity between Concepts from Different Ontologies , 2006, J. Digit. Inf. Manag..

[5]  Simone Paolo Ponzetto,et al.  WikiRelate! Computing Semantic Relatedness Using Wikipedia , 2006, AAAI.

[6]  Max J. Egenhofer,et al.  Determining Semantic Similarity among Entity Classes from Different Ontologies , 2003, IEEE Trans. Knowl. Data Eng..

[7]  Junzhong Gu,et al.  A New Model of Information Content Based on Concept ’ s Topology for Measuring Semantic Similarity in WordNet , 2012 .

[8]  Ted Pedersen,et al.  Measures of semantic similarity and relatedness in the biomedical domain , 2007, J. Biomed. Informatics.

[9]  David N. Milne Computing Semantic Relatedness using Wikipedia Link Structure , 2007 .

[10]  Abdelmajid Ben Hamadou,et al.  Wikipedia Category Graph and New Intrinsic Information Content Metric for Word Semantic Relatedness Measuring , 2012, ICDKE.

[11]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[12]  Yong Tang,et al.  Feature-based approaches to semantic similarity assessment of concepts using Wikipedia , 2015, Inf. Process. Manag..

[13]  Abdelmajid Ben Hamadou,et al.  Ontology-based approach for measuring semantic similarity , 2014, Eng. Appl. Artif. Intell..

[14]  Roberto Navigli,et al.  From senses to texts: An all-in-one graph-based approach for measuring semantic similarity , 2015, Artif. Intell..

[15]  David McLean,et al.  An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources , 2003, IEEE Trans. Knowl. Data Eng..

[16]  Ting Wang,et al.  Using semantic similarity to reduce wrong labels in distant supervision for relation extraction , 2018, Inf. Process. Manag..

[17]  Arantxa Otegi,et al.  Using knowledge-based relatedness for information retrieval , 2014, Knowledge and Information Systems.

[18]  Mohamed Ali Hadj Taieb,et al.  FM3S: Features-Based Measure of Sentences Semantic Similarity , 2015, HAIS.

[19]  Evgeniy Gabrilovich,et al.  A word at a time: computing word relatedness using temporal semantic analysis , 2011, WWW.

[20]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[21]  Martin Chodorow,et al.  Combining local context and wordnet similarity for word sense identification , 1998 .

[22]  Bo Zhang,et al.  An efficient approach for measuring semantic relatedness using Wikipedia bidirectional links , 2019, Applied Intelligence.

[23]  Ian H. Witten,et al.  An effective, low-cost measure of semantic relatedness obtained from Wikipedia links , 2008 .

[24]  Eneko Agirre,et al.  A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches , 2009, NAACL.

[25]  David Contreras,et al.  Evaluation of semantic similarity metrics applied to the automatic retrieval of medical documents: An UMLS approach , 2016, Expert Syst. Appl..

[26]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[27]  Diego Calvanese,et al.  The Description Logic Handbook: Theory, Implementation, and Applications , 2003, Description Logic Handbook.

[28]  Xiaopei Zhang,et al.  Wikipedia-based information content and semantic similarity computation , 2017, Inf. Process. Manag..

[29]  Abdelmajid Ben Hamadou,et al.  Taxonomy-based information content and wordnet-wiktionary-wikipedia glosses for semantic relatedness , 2015, Applied Intelligence.

[30]  Eneko Agirre,et al.  WikiWalk: Random walks on Wikipedia for Semantic Relatedness , 2009, Graph-based Methods for Natural Language Processing.

[31]  Junzhong Gu,et al.  A New Model of Information Content for Semantic Similarity in WordNet , 2008, 2008 Second International Conference on Future Generation Communication and Networking Symposia.

[32]  Ganggao Zhu,et al.  Computing Semantic Similarity of Concepts in Knowledge Graphs , 2017, IEEE Transactions on Knowledge and Data Engineering.

[33]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[34]  Ummi Zakiah Zainodin,et al.  Weighting-based semantic similarity measure based on topological parameters in semantic taxonomy , 2018, Nat. Lang. Eng..

[35]  Youngjoong Ko,et al.  Word Sense Disambiguation Based on Word Similarity Calculation Using Word Vector Representation from a Knowledge-based Graph , 2018, COLING.

[36]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[37]  Abdelmajid Ben Hamadou,et al.  Computing semantic relatedness using Wikipedia features , 2013, Knowl. Based Syst..

[38]  Rossitza Setchi,et al.  Semantic Retrieval of Trademarks Based on Conceptual Similarity , 2016, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[39]  Abdelmajid Ben Hamadou,et al.  LWCR: multi-Layered Wikipedia representation for Computing word Relatedness , 2016, Neurocomputing.

[40]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[41]  Fei Li,et al.  An efficient path computing model for measuring semantic similarity using edge and density , 2018, Knowledge and Information Systems.

[42]  Qian Liu,et al.  Improving Opinion Aspect Extraction Using Semantic Similarity and Aspect Associations , 2016, AAAI.

[43]  A. Tversky Features of Similarity , 1977 .

[44]  Carlos Angel Iglesias,et al.  Exploiting semantic similarity for named entity disambiguation in knowledge graphs , 2018, Expert Syst. Appl..

[45]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[46]  Evgeniy Gabrilovich,et al.  Wikipedia-based Semantic Interpretation for Natural Language Processing , 2014, J. Artif. Intell. Res..

[47]  Pu Li,et al.  A graph-based semantic relatedness assessment method combining wikipedia features , 2017, Eng. Appl. Artif. Intell..

[48]  John B. Goodenough,et al.  Contextual correlates of synonymy , 1965, CACM.

[49]  Felix Hill,et al.  SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation , 2014, CL.

[50]  David Sánchez,et al.  Ontology-based semantic similarity: A new feature-based approach , 2012, Expert Syst. Appl..

[51]  Tony Veale,et al.  An Intrinsic Information Content Metric for Semantic Similarity in WordNet , 2004, ECAI.

[52]  Rong Qu,et al.  Computing semantic similarity based on novel models of semantic representation using Wikipedia , 2018, Inf. Process. Manag..

[53]  Xiao Hua Chen,et al.  A WordNet-based semantic similarity measurement combining edge-counting and information content theory , 2015, Eng. Appl. Artif. Intell..

[54]  Alicia Martínez Rebollar,et al.  Semantic Annotation of Unstructured Documents Using Concepts Similarity , 2017, Sci. Program..

[55]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.