Investigating Ontological Similarity Theoretically with Fuzzy Set Theory, Information Content, and Tversky Similarity and Empirically with the Gene Ontology

This paper theoretically and empirically investigates ontological similarity. Tversky's parameterized ratio model of similarity [3] is shown as a unifying basis of many of the well-known ontological similarity measures. A new family of ontological similarity measures is proposed that allows parameterizing the characteristic set used to represent an ontological concept. The three subontologies of the prominent GO are used in an empirical investigation of several ontological similarity measures. A new ontological similarity measure derived from the proposed family is also empirically studied. A detailed discussion of the correlation among the measures is presented as well as a comparison of the effects of two different methods of determining a concept's information content, corpus-based and ontology-based.

[1]  Phillip W. Lord,et al.  Semantic Similarity in Biomedical Ontologies , 2009, PLoS Comput. Biol..

[2]  V. Cross,et al.  Similarity and Compatibility in Fuzzy Set Theory: Assessment And Applications , 2010 .

[3]  Dong Xu,et al.  Data Mining in Biomedicine Using Ontologies , 2009 .

[4]  Nuno Seco,et al.  Design, Implementation and Evaluation of a New Semantic Similarity Metric Combining Features and Intrinsic Information Content , 2008, OTM Conferences.

[5]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[6]  V. Cross,et al.  Tversky's Parameterized Similarity Ratio Model: A Basis for Semantic Relatedness , 2006, NAFIPS 2006 - 2006 Annual Meeting of the North American Fuzzy Information Processing Society.

[7]  Graeme Hirst,et al.  Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures , 2004 .

[8]  F ATTNEAVE,et al.  Dimensions of similarity. , 1950, The American journal of psychology.

[9]  Jérôme Euzenat,et al.  A Feature and Information Theoretic Framework for Semantic Similarity and Relatedness , 2010, SEMWEB.

[10]  Maya R. Gupta,et al.  Information-theoretic and Set-theoretic Similarity , 2006, 2006 IEEE International Symposium on Information Theory.

[11]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[12]  A. Tversky Features of Similarity , 1977 .

[13]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[14]  Max J. Egenhofer,et al.  Determining Semantic Similarity among Entity Classes from Different Ontologies , 2003, IEEE Trans. Knowl. Data Eng..

[15]  Yi Sun,et al.  Semantic, Fuzzy Set and Fuzzy Measure Similarity for the Gene Ontology , 2007, 2007 IEEE International Fuzzy Systems Conference.

[16]  N. Goodman Problems and projects , 1979 .

[17]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[18]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[19]  Tharam S. Dillon,et al.  On the Move to Meaningful Internet Systems, OTM 2010 , 2010, Lecture Notes in Computer Science.

[20]  D. Gentner,et al.  Respects for similarity , 1993 .

[21]  Tony Veale,et al.  An Intrinsic Information Content Metric for Semantic Similarity in WordNet , 2004, ECAI.

[22]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[23]  Xinran Yu,et al.  Mathematical and Experimental Investigation of Ontological Similarity Measures and Their Use in Biomedical Domains , 2010 .