A Novel Information Theoretic Framework for Finding Semantic Similarity in WordNet

Information Content (IC) based measures for finding semantic similarity is gaining preferences day by day as semantics of concepts can be highly characterized by information theory. This IC of concept can precisely quantify its generality and concreteness and generates dimensions for better understanding of concept semantics. The conventional way for calculating IC is based on the probability of appearance of concepts in corpora. Due to data sparseness and corpora dependency issues of those conventional approaches, a new corpora independent intrinsic IC calculation measure has evolved and gaining better performance over those conventional measures. In this paper we analyze several intrinsic IC models, emphasize related issues and present a novel information theoretic intrinsic model which can calculate IC of concepts based solely on underlying ontology. Our intense focus stays on several topological structures of the underlying ontology. Accuracy of intrinsic IC calculation measure relies on those factors deeply. Our approach is evaluated and compared with corpora and intrinsic IC based methods based on benchmark data set. Experimental results show that our intrinsic IC model achieves significant results than the existing techniques.

[1]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[2]  Tony Veale,et al.  An Intrinsic Information Content Metric for Semantic Similarity in WordNet , 2004, ECAI.

[3]  Junzhong Gu,et al.  A New Model of Information Content for Semantic Similarity in WordNet , 2008, 2008 Second International Conference on Future Generation Communication and Networking Symposia.

[4]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[5]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[6]  A. Tversky Features of Similarity , 1977 .

[7]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[8]  Jérôme Euzenat,et al.  A Feature and Information Theoretic Framework for Semantic Similarity and Relatedness , 2010, SEMWEB.

[9]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[10]  James Curran,et al.  Ensemble Methods for Automatic Thesaurus Extraction , 2002, EMNLP.

[11]  Zhongqing Yu,et al.  A New Model of Information Content for Measuring the Semantic Similarity between Concepts , 2013, 2013 International Conference on Cloud Computing and Big Data.

[12]  David Sánchez,et al.  Ontology-based semantic similarity: A new feature-based approach , 2012, Expert Syst. Appl..

[13]  David Sánchez,et al.  Semantic similarity estimation in the biomedical domain: An ontology-based information-theoretic perspective , 2011, J. Biomed. Informatics.

[14]  Masoud Rahgozar,et al.  A Knowledge-Based Question Answering System for B2C eCommerce , 2008, Fifth International Conference on Information Technology: New Generations (itng 2008).

[15]  Betsy L. Humphreys,et al.  Relationships in Medical Subject Headings (MeSH) , 2001 .

[16]  John Atkinson,et al.  Discovering implicit intention-level knowledge from natural-language texts , 2008, Knowl. Based Syst..

[17]  Xindong Wu,et al.  A Large Probabilistic Semantic Network Based Approach to Compute Term Similarity , 2015, IEEE Transactions on Knowledge and Data Engineering.

[18]  Ted Pedersen,et al.  Using Measures of Semantic Relatedness for Word Sense Disambiguation , 2003, CICLing.

[19]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[20]  Richi Nayak,et al.  XML schema clustering with semantic and hierarchical similarity measures , 2007, Knowl. Based Syst..

[21]  Jorge García Duque,et al.  A flexible semantic inference methodology to reason about user preferences in knowledge-based recommender systems , 2008, Knowl. Based Syst..

[22]  Junzhong Gu,et al.  A New Model of Information Content Based on Concept ’ s Topology for Measuring Semantic Similarity in WordNet , 2012 .

[23]  David Sánchez,et al.  A framework for unifying ontology-based semantic similarity measures: A study in the biomedical domain , 2014, J. Biomed. Informatics.

[24]  David Sánchez,et al.  Ontology-based information content computation , 2011, Knowl. Based Syst..

[25]  Danushka Bollegala,et al.  Measuring semantic similarity between words using web search engines , 2007, WWW '07.

[26]  Kent A Spackman,et al.  SNOMED CT milestones: endorsements are added to already-impressive standards credentials. , 2004, Healthcare informatics : the business magazine for information and communication systems.

[27]  Mehran Sahami,et al.  A web-based kernel function for measuring the similarity of short text snippets , 2006, WWW '06.

[28]  David Sánchez,et al.  A New Model to Compute the Information Content of Concepts from Taxonomic Knowledge , 2012, Int. J. Semantic Web Inf. Syst..

[29]  Mark Stevenson,et al.  A Semantic Approach to IE Pattern Induction , 2005, ACL.

[30]  Carles Sierra,et al.  Merging intelligent agency and the Semantic Web , 2008, Knowl. Based Syst..

[31]  Anna Formica,et al.  Concept similarity in Formal Concept Analysis: An information content approach , 2008, Knowl. Based Syst..

[32]  David Sánchez,et al.  Content annotation for the semantic web: an automatic web-based approach , 2011, Knowledge and Information Systems.

[33]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[34]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[35]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[36]  Euripides G. M. Petrakis,et al.  X-Similarity: Computing Semantic Similarity between Concepts from Different Ontologies , 2006, J. Digit. Inf. Manag..

[37]  Mark Hagland IT stitches up surgery data. The surgical suite can put automation to good use. , 2004, Healthcare informatics : the business magazine for information and communication systems.

[38]  David McLean,et al.  An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources , 2003, IEEE Trans. Knowl. Data Eng..

[39]  Haixun Wang,et al.  Probase: a probabilistic taxonomy for text understanding , 2012, SIGMOD Conference.

[40]  W. Marsden I and J , 2012 .

[41]  Graeme Hirst,et al.  Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures , 2004 .

[42]  John A. Keane,et al.  Using Web-Search Results to Measure Word-Group Similarity , 2008, COLING.

[43]  Max J. Egenhofer,et al.  Determining Semantic Similarity among Entity Classes from Different Ontologies , 2003, IEEE Trans. Knowl. Data Eng..

[44]  Masoud Rahgozar,et al.  A Knowledge-Based Question Answering System for B2C eCommerce , 2008, ITNG.

[45]  Pierluigi Ritrovato,et al.  Advanced ontology management system for personalised e-Learning , 2009, Knowl. Based Syst..

[46]  Hsin-Hsi Chen,et al.  Novel Association Measures Using Web Search with Double Checking , 2006, ACL.

[47]  Danushka Bollegala,et al.  A Web Search Engine-Based Approach to Measure Semantic Similarity between Words , 2011, IEEE Transactions on Knowledge and Data Engineering.