A refinement of the well-founded Information Content models with a very detailed experimental survey on WordNet.

In a recent paper, we introduce a new family of Information Content (IC) models based on the estimation of the conditional probability between child and parent concepts. This work is encouraged by the …nding of two drawbacks in the computational method of our aforementioned family of IC models, as well as other two gaps in the literature. First gap is that two of our cognitive IC models do not satisfy the axiom that constrains the sum of probabilities on the leaf nodes to be 1, whilst some ontologies with multiple inheritance could prevent the IC model satisfying the growing monotonicity axiom in concepts with multiple parents. Second gap is the lack of a complete and updated experimental survey including a pairwise statistical signi…cance analysis between most IC models and ontology-based similarity measures. Finally a third gap is the lack of replication and con…rmation of previous methods and results in most works. The latest two gaps are especially signi…cant in the current state of the problem, in which there is no convincing winner within the family of intrinsic IC-based similarity measures and the performance margin is very narrow. In order to bridge the aforementioned gaps, this paper introduces the following contributions: (1) a re…nement of our recent family of well-founded Information Content (IC) models; (2) eight new intrinsic IC models and one new corpus-based IC model; and (3) a very detailed experimental survey of ontology-based similarity measures and Information Content (IC) models on WordNet, including the evaluation and statistical signi…cance analysis on the …ve most signi…cant datasets of most ontology-based similarity measures and all WordNet-based IC models reported in the literature, with the only exception of the IC models recently introduced by Harispe et al. (2015a) and Ben Aouicha et al. (2016b). The evaluation is entirely based on a Java software library called HESML which has been developed by the authors in order to replicate all methods evaluated herein. The new IC models obtain rivaling results as regard the state-of-the-art methods and improve our previous mod- els, whilst the experimental survey allows a detailed and conclusive image of the state of the problem to be drawn by setting the new state of the art and quantifying the main achievements of the last three decades.

[1]  Junzhong Gu,et al.  A New Model of Information Content for Semantic Similarity in WordNet , 2008, 2008 Second International Conference on Future Generation Communication and Networking Symposia.

[2]  Flavius Frasincar,et al.  A semantic approach for extracting domain taxonomies from text , 2014, Decis. Support Syst..

[3]  Helena Sofia Pinto,et al.  The Next Generation of Similarity Measures that Fully Explore the Semantics in Biomedical Ontologies , 2013, J. Bioinform. Comput. Biol..

[4]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[5]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[6]  Wei Song,et al.  Genetic algorithm for text clustering using ontology and evaluating the validity of various semantic similarity measures , 2009, Expert Syst. Appl..

[7]  Jennifer S Trueblood,et al.  A quantum geometric model of similarity. , 2013, Psychological review.

[8]  Noam Slonim,et al.  TR9856: A Multi-word Term Relatedness Benchmark , 2015, ACL.

[9]  Graeme Hirst,et al.  Lexical chains as representations of context for the detection and correction of malapropisms , 1995 .

[10]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[11]  Domenico Talia,et al.  UFOme: An ontology mapping system with strategy prediction capabilities , 2010, Data Knowl. Eng..

[12]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[13]  Ahmad Abdollahzadeh Barforoush,et al.  A new word sense similarity measure in wordnet , 2008, 2008 International Multiconference on Computer Science and Information Technology.

[14]  Phillip W. Lord,et al.  Semantic Similarity in Biomedical Ontologies , 2009, PLoS Comput. Biol..

[15]  Dennis Shasha,et al.  Packing experiments for sharing and publication , 2013, SIGMOD '13.

[16]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[17]  Ted Pedersen,et al.  Information Content Measures of Semantic Similarity Perform Better Without Sense-Tagged Text , 2010, NAACL.

[18]  David Sánchez,et al.  Towards the estimation of feature-based semantic similarity using multiple ontologies , 2014, Knowl. Based Syst..

[19]  David Sánchez,et al.  Semantic similarity estimation in the biomedical domain: An ontology-based information-theoretic perspective , 2011, J. Biomed. Informatics.

[20]  Giuseppe Pirrò,et al.  A semantic similarity metric combining features and intrinsic information content , 2009, Data Knowl. Eng..

[21]  Emmanuel M. Pothos,et al.  Progress and current challenges with the quantum similarity model , 2015, Front. Psychol..

[22]  Christopher D. Manning,et al.  Random Walks for Text Semantic Similarity , 2009, Graph-based Methods for Natural Language Processing.

[23]  Ted Pedersen,et al.  Empiricism Is Not a Matter of Faith , 2008, Computational Linguistics.

[24]  Xiao Hua Chen,et al.  A WordNet-based semantic similarity measurement combining edge-counting and information content theory , 2015, Eng. Appl. Artif. Intell..

[25]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[26]  Thad Hughes,et al.  Lexical Semantic Relatedness with Random Graph Walks , 2007, EMNLP.

[27]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[28]  A. Tversky Features of Similarity , 1977 .

[29]  Andrei Popescu-Belis,et al.  Computing text semantic relatedness using the contents and links of a hypertext encyclopedia , 2013, Artif. Intell..

[30]  Mário J. Silva,et al.  Measuring semantic similarity between Gene Ontology terms , 2007, Data Knowl. Eng..

[31]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[32]  Jorge Martínez Gil,et al.  Evolutionary algorithm based on different semantic similarity functions for synonym recognition in the biomedical domain , 2013, Knowl. Based Syst..

[33]  Graeme Hirst,et al.  Distributional Measures of Semantic Distance: A Survey , 2012, ArXiv.

[34]  Roberto Navigli,et al.  Cross level semantic similarity: an evaluation framework for universal measures of similarity , 2015, Lang. Resour. Evaluation.

[35]  Felix Hill,et al.  SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation , 2014, CL.

[36]  Martin Bichler,et al.  Reproducible experiments on dynamic resource allocation in cloud data centers , 2016, Inf. Syst..

[37]  Junzhong Gu,et al.  New model of semantic similarity measuring in wordnet , 2008, 2008 3rd International Conference on Intelligent System and Knowledge Engineering.

[38]  Silvana Quaglini,et al.  A knowledge-intensive approach to process similarity calculation , 2015, Expert Syst. Appl..

[39]  Hyunbo Cho,et al.  A novel method for measuring semantic similarity for XML schema matching , 2008, Expert Syst. Appl..

[40]  Ana M. García-Serrano,et al.  A new family of information content models with an experimental survey on WordNet , 2015, Knowl. Based Syst..

[41]  SeungJin Lim,et al.  A Graph Modeling of Semantic Similarity between Words , 2007 .

[42]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[43]  Eneko Agirre,et al.  A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches , 2009, NAACL.

[44]  Valerie V. Cross,et al.  Using semantic similarity in ontology alignment , 2011, OM.

[45]  David Contreras,et al.  Evaluation of semantic similarity metrics applied to the automatic retrieval of medical documents: An UMLS approach , 2016, Expert Syst. Appl..

[46]  Jung-Hsien Chiang,et al.  Similar genes discovery system (SGDS): Application for predicting possible pathways by using GO semantic similarity measure , 2008, Expert Syst. Appl..

[47]  Abdelmajid Ben Hamadou,et al.  Taxonomy-based information content and wordnet-wiktionary-wikipedia glosses for semantic relatedness , 2015, Applied Intelligence.

[48]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[49]  Mohamed Ali Hadj Taieb,et al.  FM3S: Features-Based Measure of Sentences Semantic Similarity , 2015, HAIS.

[50]  David Sánchez,et al.  A New Model to Compute the Information Content of Concepts from Taxonomic Knowledge , 2012, Int. J. Semantic Web Inf. Syst..

[51]  Martin Bichler,et al.  More than bin packing: Dynamic resource allocation strategies in cloud data centers , 2015, Inf. Syst..

[52]  Zhongqing Yu,et al.  A New Model of Information Content for Measuring the Semantic Similarity between Concepts , 2013, 2013 International Conference on Cloud Computing and Big Data.

[53]  Jerome R. Busemeyer,et al.  Quantum Models of Cognition and Decision , 2012 .

[54]  Simone Paolo Ponzetto,et al.  WikiRelate! Computing Semantic Relatedness Using Wikipedia , 2006, AAAI.

[55]  Benjamin C. M. Fung,et al.  Subject-based semantic document clustering for digital forensic investigations , 2013, Data Knowl. Eng..

[56]  Robert L. Glass A delicate issue: what to do when the state of the practice leads the state of the art , 2000, DATB.

[57]  Nuno Seco,et al.  Design, Implementation and Evaluation of a New Semantic Similarity Metric Combining Features and Intrinsic Information Content , 2008, OTM Conferences.

[58]  Zili Zhou,et al.  Domain Ontology Generation Based on WordNet and Internet , 2009, 2009 International Conference on Management and Service Science.

[59]  Ming Che Lee,et al.  A novel sentence similarity measure for semantic-based expert systems , 2011, Expert Syst. Appl..

[60]  Ted Pedersen,et al.  Evaluating measures of semantic similarity and relatedness to disambiguate terms in biomedical text , 2013, J. Biomed. Informatics.

[61]  Dennis Shasha,et al.  A collaborative approach to computational reproducibility , 2016, Inf. Syst..

[62]  Junzhong Gu,et al.  Measuring Semantic Similarity of Word Pairs Using Path and Information Content , 2014 .

[63]  John B. Goodenough,et al.  Contextual correlates of synonymy , 1965, CACM.

[64]  Ted Pedersen,et al.  Using WordNet-based Context Vectors to Estimate the Semantic Relatedness of Concepts , 2006 .

[65]  Ana M. García-Serrano,et al.  HESML: A scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset , 2017, Inf. Syst..

[66]  Eneko Agirre,et al.  WikiWalk: Random walks on Wikipedia for Semantic Relatedness , 2009, Graph-based Methods for Natural Language Processing.

[67]  Emmanuel M. Pothos,et al.  Structured representations in a quantum probability model of similarity , 2015 .

[68]  Mohamed Ali Hadj Taieb,et al.  Computing semantic similarity between biomedical concepts using new information content approach , 2016, J. Biomed. Informatics.

[69]  Graeme Hirst,et al.  Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures , 2004 .

[70]  David McLean,et al.  An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources , 2003, IEEE Trans. Knowl. Data Eng..

[71]  Jorge Martinez-Gil CoTO: A novel approach for fuzzy aggregation of semantic similarity measures , 2016, Cognitive Systems Research.

[72]  Nazlia Omar,et al.  Evaluating Knowledge-Based Semantic Measures on Arabic , 2014 .

[73]  Mounira Harzallah,et al.  A generic framework for comparing semantic similarities on a subsumption hierarchy , 2008, ECAI.

[74]  Montserrat Batet,et al.  An information theoretic approach to improve semantic similarity assessments across multiple ontologies , 2014, Inf. Sci..

[75]  Graeme Hirst,et al.  Distributional measures of concept-distance: A task-oriented evaluation , 2006, EMNLP.

[76]  David Sánchez,et al.  An ontology-based measure to compute semantic similarity in biomedicine , 2011, J. Biomed. Informatics.

[77]  Weiming Shen,et al.  An weighted ontology-based semantic similarity algorithm for web service , 2009, Expert Syst. Appl..

[78]  Ted Pedersen,et al.  Measures of semantic similarity and relatedness in the biomedical domain , 2007, J. Biomed. Informatics.

[79]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[80]  Ana M. García-Serrano,et al.  A novel family of IC-based similarity measures with a detailed experimental survey on WordNet , 2015, Eng. Appl. Artif. Intell..

[81]  Abdelmajid Ben Hamadou,et al.  Ontology-based approach for measuring semantic similarity , 2014, Eng. Appl. Artif. Intell..

[82]  Xindong Wu,et al.  A Large Probabilistic Semantic Network Based Approach to Compute Term Similarity , 2015, IEEE Transactions on Knowledge and Data Engineering.

[83]  Roberto Navigli,et al.  From senses to texts: An all-in-one graph-based approach for measuring semantic similarity , 2015, Artif. Intell..

[84]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[85]  Jérôme Euzenat,et al.  A Feature and Information Theoretic Framework for Semantic Similarity and Relatedness , 2010, SEMWEB.