Incorporating Semantic Similarity Measure in Genetic Algorithm: An Approach for Searching the Gene Ontology Terms

The most important property of the Gene Ontology is the terms. These control vocabularies are defined to provide consistent descriptions of gene products that are shareable and computationally accessible by humans, software agent, or other machine-readable meta-data. Each term is associated with information such as definition, synonyms, database references, amino acid sequences, and relationships to other terms. This information has made the Gene Ontology broadly applied in microarray and proteomic analysis. However, the process of searching the terms is still carried out using traditional approach which is based on keyword matching. The weaknesses of this approach are: ignoring semantic relationships between terms, and highly depending on a specialist to find similar terms. Therefore, this study combines semantic similarity measure and genetic algorithm to perform a better retrieval process for searching semantically similar terms. The semantic similarity measure is used to compute similitude strength between two terms.Then, the genetic algorithm is employed to perform batch retrievals and to handle the situation of the large search space of the Gene Ontology graph. The computational results are presented to show the effectiveness of the proposed algorithm.

[1]  Mário J. Silva,et al.  Semantic similarity over the gene ontology: family correlation and selecting disjunctive ancestors , 2005, CIKM '05.

[2]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[3]  Max J. Egenhofer,et al.  Determining Semantic Similarity among Entity Classes from Different Ontologies , 2003, IEEE Trans. Knowl. Data Eng..

[4]  Angel Rubio,et al.  Correlation between gene expression and GO semantic similarity , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[5]  M. Ashburner,et al.  An ontology for cell types , 2005, Genome Biology.

[6]  M. Ueffing,et al.  Proteomic analysis of the porcine interphotoreceptor matrix , 2005, Proteomics.

[7]  R. Durbin,et al.  The Sequence Ontology: a tool for the unification of genome annotations , 2005, Genome Biology.

[8]  Martin Chodorow,et al.  Combining local context and wordnet similarity for word sense identification , 1998 .

[9]  Chen-Chieh Feng,et al.  Assessment of semantic similarity between land use/land cover classification systems , 2004, Comput. Environ. Urban Syst..

[10]  Themis Panayiotopoulos,et al.  Web Search Using a Genetic Algorithm , 2001, IEEE Internet Comput..

[11]  Mohand Boughanem,et al.  Multiple query evaluation based on an enhanced genetic algorithm , 2003, Inf. Process. Manag..

[12]  Cheng-Jye Luh,et al.  Generating page clippings from web search results using a dynamically terminated genetic algorithm , 2005, Inf. Syst..

[13]  Ibrahim Kushchu,et al.  Web-based evolutionary and adaptive information retrieval , 2005, IEEE Transactions on Evolutionary Computation.

[14]  Baldomero Oliva,et al.  Prediction of protein-protein interactions using distant conservation of sequence patterns and structure relationships , 2005, Bioinform..

[15]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[16]  Jorng-Tzong Horng,et al.  Applying genetic algorithms to query optimization in document retrieval , 2000, Inf. Process. Manag..

[17]  Barry Smith,et al.  Biodynamic ontology: applying BFO in the biomedical domain. , 2004, Studies in health technology and informatics.

[18]  Patrice Koehl,et al.  MAO: a Multiple Alignment Ontology for nucleic acid and protein sequences , 2005, Nucleic acids research.

[19]  Hongfang Liu,et al.  DynGO: a tool for visualizing and mining of Gene Ontology and its associations , 2005, BMC Bioinformatics.

[20]  Giovanni Felici,et al.  Improving search results with data mining in a thematic search engine , 2004, Comput. Oper. Res..

[21]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[22]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[23]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[24]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[25]  Ying Xu,et al.  Prediction of functional modules based on comparative genome analysis and Gene Ontology application , 2005, Nucleic acids research.

[26]  Michel Dumontier,et al.  CO: A chemical ontology for identification of functional groups and semantic comparison of small molecules , 2005, FEBS letters.

[27]  Hsinchun Chen,et al.  Machine Learning for Information Retrieval: Neural Networks, Symbolic Learning, and Genetic Algorithms , 1995, J. Am. Soc. Inf. Sci..

[28]  Zalmiyah Zakaria,et al.  Automatic clustering of gene ontology by genetic algorithm , 2007 .

[29]  Uwe Reyle,et al.  Developing a Protein-Interactions Ontology , 2003, Comparative and functional genomics.

[30]  Sankar K. Pal,et al.  Web mining in soft computing framework: relevance, state of the art and future directions , 2002, IEEE Trans. Neural Networks.

[31]  David P. Vinson,et al.  Semantic similarity and grammatical class in naming actions , 2005, Cognition.

[32]  Graeme Hirst,et al.  Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures , 2004 .

[33]  Yingyao Zhou,et al.  The Plasmodium falciparum sexual development transcriptome: a microarray analysis using ontology-based pattern identification. , 2005, Molecular and biochemical parasitology.