gSemSim: Semantic Similarity Measure for Intra Gene Ontology Terms

Gene Ontology (GO) is an important bioinformatics scheme to unify the representation of gene and gene product attributes across all species. Measuring similarity or distance between GO terms is a key step for determining hidden relationship between genes. The notion of similarity between GO terms is a usual step in knowledge discovery related tasks. In literature various similarity measures between GO terms have been proposed. We have introduced a novel similarity measure scheme to improve three conventional similarity measures to reduce their limitations. The salient feature of the proposed GO Semantic Similarity (gSemSim) measure is its ability to show more realistic similarity between concepts in perspective of domain knowledge. A comparative result with other technique has also been presented that showing an improved contextual meaning of the proposed semantic similarity. This study is expected to assist the community of bio informaticians in the selection of better similarity measure required for correct annotations of genes in gene ontology.

[1]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[2]  A. Tversky Features of Similarity , 1977 .

[3]  Frederick P. Roth,et al.  Predicting phenotype from patterns of annotation , 2003, ISMB.

[4]  Joaquín Dopazo,et al.  FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes , 2004, Bioinform..

[5]  Jan Komorowski,et al.  Learning Rule-based Models of Biological Process from Gene Expression Time Profiles Using Gene Ontology , 2003, Bioinform..

[6]  Max J. Egenhofer,et al.  Comparing geospatial entity classes: an asymmetric and context-dependent similarity measure , 2004, Int. J. Geogr. Inf. Sci..

[7]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[8]  Using Context,et al.  Modeling and Using Context, 6th International and Interdisciplinary Conference, CONTEXT 2007, Roskilde, Denmark, August 20-24, 2007, Proceedings , 2007, CONTEXT.

[9]  Eva Blomqvist,et al.  Ontology-Based Relevance Assessment: An Evaluation of Different Semantic Similarity Measures , 2008, OTM Conferences.

[10]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[11]  Delphine Pessoa,et al.  CESSM: collaborative evaluation of semantic similarity measures , 2009 .

[12]  Christiane Fellbaum,et al.  Lexical Chains as Representations of Context for the Detection and Correction of Malapropisms , 1998 .

[13]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[14]  Mário J. Silva,et al.  Mining the BioLiterature: towards automatic annotation of genes and proteins , 2006 .

[15]  Carole A. Goble,et al.  Semantic Similarity Measures as Tools for Exploring the Gene Ontology , 2002, Pacific Symposium on Biocomputing.

[16]  Lipika Dey,et al.  A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set , 2007, Pattern Recognit. Lett..

[17]  Catia Pesquita,et al.  Metrics for GO based protein semantic similarity: a systematic evaluation , 2008, BMC Bioinformatics.

[18]  Carsten Keßler,et al.  Similarity Measurement in Context , 2007, CONTEXT.

[19]  Euripides G. M. Petrakis,et al.  SEMANTIC SIMILARITY MEASURES: A COMPARISON STUDY 1 , 2005 .

[20]  Russ B. Altman,et al.  Including Biological Literature Improves Homology Search , 2001, Pacific Symposium on Biocomputing.

[21]  M.A. Qadir,et al.  OntoFetcher: An approach for query generation to gather ontologies and ranking them by ensuring user's context , 2008, 2008 4th International Conference on Emerging Technologies.