DODO: an efficient orthologous genes assignment tool based on domain architectures. Domain based ortholog detection

BackgroundOrthologs are genes derived from the same ancestor gene loci after speciation events. Orthologous proteins usually have similar sequences and perform comparable biological functions. Therefore, ortholog identification is useful in annotations of newly sequenced genomes. With rapidly increasing number of sequenced genomes, constructing or updating ortholog relationship between all genomes requires lots of effort and computation time. In addition, elucidating ortholog relationships between distantly related genomes is challenging because of the lower sequence similarity. Therefore, an efficient ortholog detection method that can deal with large number of distantly related genomes is desired.ResultsAn efficient ortholog detection pipeline DODO (DOmain based Detection of Orthologs) is created on the basis of domain architectures in this study. Supported by domain composition, which usually directly related with protein function, DODO could facilitate orthologs detection across distantly related genomes. DODO works in two main steps. Starting from domain information, it first assigns protein groups according to their domain architectures and further identifies orthologs within those groups with much reduced complexity. Here DODO is shown to detect orthologs between two genomes in considerably shorter period of time than traditional methods of reciprocal best hits and it is more significant when analyzed a large number of genomes. The output results of DODO are highly comparable with other known ortholog databases.ConclusionsDODO provides a new efficient pipeline for detection of orthologs in a large number of genomes. In addition, a database established with DODO is also easier to maintain and could be updated relatively effortlessly. The pipeline of DODO could be downloaded from http://140.109.42.19:16080/dodo_web/home.htm

[1]  Tao Jiang,et al.  MSOAR: A High-Throughput Ortholog Assignment System Based on Genome Rearrangement , 2007, J. Comput. Biol..

[2]  Sean R. Eddy,et al.  RIO: Analyzing proteomes by automated phylogenomics using resampled inference of orthologs , 2002, BMC Bioinformatics.

[3]  D. Eisenberg,et al.  Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Erik L. L. Sonnhammer,et al.  Automated ortholog inference from phylogenetic trees and calculation of orthology reliability , 2002, Bioinform..

[5]  Berend Snel,et al.  Keeping Afloat: A Strategy for Small Island Nations , 2005, BMC Bioinformatics.

[6]  W. Fitch Homology a personal view on some of the problems. , 2000, Trends in genetics : TIG.

[7]  Christian E. V. Storm,et al.  Comprehensive analysis of orthologous protein domains using the HOPS database. , 2003, Genome research.

[8]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[9]  Michael Y. Galperin,et al.  The COG database: a tool for genome-scale analysis of protein functions and evolution , 2000, Nucleic Acids Res..

[10]  Tao Jiang,et al.  Clustering of Main orthologs for Multiple genomes , 2008, J. Bioinform. Comput. Biol..

[11]  M. Levitt Nature of the protein universe , 2009, Proceedings of the National Academy of Sciences.

[12]  Erik L. L. Sonnhammer,et al.  InParanoid 7: new algorithms and tools for eukaryotic orthology analysis , 2009, Nucleic Acids Res..

[13]  Robert D. Finn,et al.  Pfam: clans, web tools and services , 2005, Nucleic Acids Res..

[14]  C. Chothia,et al.  The geometry of domain combination in proteins. , 2002, Journal of molecular biology.

[15]  S. Pongor,et al.  The quest for orthologs: finding the corresponding gene across genomes. , 2008, Trends in genetics : TIG.

[16]  Andreas Prlic,et al.  Ensembl 2007 , 2006, Nucleic Acids Res..

[17]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[18]  Teresa M. Przytycka,et al.  COCO-CL: hierarchical clustering of homology relations based on evolutionary correlations , 2006, Bioinform..

[19]  Evgeny M. Zdobnov,et al.  OrthoDB: the hierarchical catalog of eukaryotic orthologs , 2007, Nucleic Acids Res..

[20]  F. Delsuc,et al.  Phylogenomics and the reconstruction of the tree of life , 2005, Nature Reviews Genetics.

[21]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[22]  Christian E. V. Storm,et al.  Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. , 2001, Journal of molecular biology.

[23]  C. Chothia,et al.  Structure, function and evolution of multidomain proteins. , 2004, Current opinion in structural biology.

[24]  W. Fitch Distinguishing homologous from analogous proteins. , 1970, Systematic zoology.