A genetic similarity algorithm for searching the Gene Ontology terms and annotating anonymous protein sequences

A genetic similarity algorithm is introduced in this study to find a group of semantically similar Gene Ontology terms. The genetic similarity algorithm combines semantic similarity measure algorithm with parallel genetic algorithm. The semantic similarity measure algorithm is used to compute the similitude strength between the Gene Ontology terms. Then, the parallel genetic algorithm is employed to perform batch retrieval and to accelerate the search in large search space of the Gene Ontology graph. The genetic similarity algorithm is implemented in the Gene Ontology browser named basic UTMGO to overcome the weaknesses of the existing Gene Ontology browsers which use a conventional approach based on keyword matching. To show the applicability of the basic UTMGO, we extend its structure to develop a Gene Ontology -based protein sequence annotation tool named extended UTMGO. The objective of developing the extended UTMGO is to provide a simple and practical tool that is capable of producing better results and requires a reasonable amount of running time with low computing cost specifically for offline usage. The computational results and comparison with other related tools are presented to show the effectiveness of the proposed algorithm and tools.

[1]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[2]  Mário J. Silva,et al.  Semantic similarity over the gene ontology: family correlation and selecting disjunctive ancestors , 2005, CIKM '05.

[3]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[4]  Hai Hu,et al.  Assessing semantic similarity measures for the characterization of human regulatory pathways , 2006, Bioinform..

[5]  Mohand Boughanem,et al.  Multiple query evaluation based on an enhanced genetic algorithm , 2003, Inf. Process. Manag..

[6]  Geoffrey J. Barton,et al.  GOtcha: a new method for prediction of protein function assessed by the annotation of seven genomes , 2004, BMC Bioinformatics.

[7]  Yongchuan Tang,et al.  Linguistic modelling based on semantic similarity relation among linguistic labels , 2006, Fuzzy Sets Syst..

[8]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[9]  Hongfang Liu,et al.  DynGO: a tool for visualizing and mining of Gene Ontology and its associations , 2005, BMC Bioinformatics.

[10]  Christel Daniel-Le Bozec,et al.  Computation of semantic similarity within an ontology of breast pathology to assist inter-observer consensus , 2006, Comput. Biol. Medicine.

[11]  Roland Eils,et al.  GOPET: A tool for automated predictions of Gene Ontology terms , 2006, BMC Bioinformatics.

[12]  Naoki Shibata,et al.  Techniques to improve exploration efficiency of parallel self-adaptive genetic algorithms by dispensing with iteration and synchronization , 2006, Systems and Computers in Japan.

[13]  Gene Ontology Consortium,et al.  The Gene Ontology (GO) project in 2006 , 2005, Nucleic Acids Res..

[14]  Mark Gerstein,et al.  The Database of Macromolecular Motions: new features added at the decade mark , 2005, Nucleic Acids Res..

[15]  S. Mitra,et al.  Bioinformatics with soft computing , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[16]  Ge Gao,et al.  DRTF: a database of rice transcription factors , 2006, Bioinform..

[17]  H. Iba,et al.  Gene selection for classification of cancers using probabilistic model building genetic algorithm. , 2005, Bio Systems.

[18]  Emily Dimmer,et al.  The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology , 2004, Nucleic Acids Res..

[19]  Jean-Michel Claverie,et al.  Phydbac (phylogenomic display of bacterial genes): an interactive resource for the annotation of bacterial genomes , 2003, Nucleic Acids Res..

[20]  Adam Godzik,et al.  JAFA: a protein function annotation meta-server , 2006, Nucleic Acids Res..

[21]  Sankar K. Pal,et al.  Web mining in soft computing framework: relevance, state of the art and future directions , 2002, IEEE Trans. Neural Networks.

[22]  William S Maki,et al.  An efficient method for estimating semantic similarity based on feature overlap: Reliability and validity of semantic feature ratings , 2006, Behavior research methods.

[23]  D. Barrell,et al.  The Gene Ontology Annotation (GOA) project: implementation of GO in SWISS-PROT, TrEMBL, and InterPro. , 2003, Genome research.

[24]  Ibrahim Kushchu,et al.  Web-based evolutionary and adaptive information retrieval , 2005, IEEE Transactions on Evolutionary Computation.

[25]  Matthias Mann,et al.  NOPdb: Nucleolar Proteome Database , 2005, Nucleic Acids Res..

[26]  M. Andrea Rodríguez,et al.  A genetic algorithm for searching spatial configurations , 2005, IEEE Transactions on Evolutionary Computation.

[27]  Michael Schroeder,et al.  SCOPPI: a structural classification of protein–protein interfaces , 2005, Nucleic Acids Res..

[28]  Carl J. Schmidt,et al.  GoFigure: Automated Gene OntologyTM annotation , 2003, Bioinform..

[29]  Rahul,et al.  Optimization of FRP composites against impact induced failure using island model parallel genetic algorithm , 2005 .

[30]  James M. Keller,et al.  Fuzzy Measures on the Gene Ontology for Gene Product Similarity , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[31]  Naoki Shibata,et al.  Techniques to improve exploration efficiency of parallel self-adaptive genetic algorithms by dispensing with iteration and synchronization , 2006 .

[32]  Hans Lehrach,et al.  GOblet: a platform for Gene Ontology annotation of anonymous sequence data , 2004, Nucleic Acids Res..

[33]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[34]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[35]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[36]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[37]  K. Katayama,et al.  Analysis of crossovers and selections in a coarse-grained parallel genetic algorithm , 2003 .

[38]  Cheng-Jye Luh,et al.  Generating page clippings from web search results using a dynamically terminated genetic algorithm , 2005, Inf. Syst..

[39]  Rolf Apweiler,et al.  InterProScan: protein domains identifier , 2005, Nucleic Acids Res..

[40]  Angel Rubio,et al.  Correlation between Gene Expression and GO Semantic Similarity , 2005, TCBB.