OrthoInspector: comprehensive orthology analysis and visual exploration

BackgroundThe accurate determination of orthology and inparalogy relationships is essential for comparative sequence analysis, functional gene annotation and evolutionary studies. Various methods have been developed based on either simple blast all-versus-all pairwise comparisons and/or time-consuming phylogenetic tree analyses.ResultsWe have developed OrthoInspector, a new software system incorporating an original algorithm for the rapid detection of orthology and inparalogy relations between different species. In comparisons with existing methods, OrthoInspector improves detection sensitivity, with a minimal loss of specificity. In addition, several visualization tools have been developed to facilitate in-depth studies based on these predictions. The software has been used to study the orthology/in-paralogy relationships for a large set of 940,855 protein sequences from 59 different eukaryotic species.ConclusionOrthoInspector is a new software system for orthology/paralogy analysis. It is made available as an independent software suite that can be downloaded and installed for local use. Command line querying facilitates the integration of the software in high throughput processing pipelines and a graphical interface provides easy, intuitive access to results for the non-expert.

[1]  Acj Roth,et al.  Erratum: Algorithm of OMA for large-scale orthology inference (BMC Bioinformatics (2008) vol. 9 (518)) , 2009 .

[2]  E. Koonin Orthologs, Paralogs, and Evolutionary Genomics 1 , 2005 .

[3]  P. Argos,et al.  SRS: information retrieval system for molecular biology data banks. , 1996, Methods in enzymology.

[4]  Andrey Alexeyenko,et al.  Overview and comparison of ortholog databases. , 2006, Drug discovery today. Technologies.

[5]  Feng Chen,et al.  OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups , 2005, Nucleic Acids Res..

[6]  Olivier Poch,et al.  Décrypthon Grid - Grid Resources Dedicated to Neuromuscular Disorders , 2010, HealthGrid.

[7]  T. Gabaldón Large-scale assignment of orthology: back to phylogenetics? , 2008, Genome Biology.

[8]  Michel Schneider,et al.  UniProtKB/Swiss-Prot. , 2007, Methods in molecular biology.

[9]  Sebastian Proost,et al.  The flowering world: a tale of duplications. , 2009, Trends in plant science.

[10]  Kimmen Sjölander,et al.  Berkeley PHOG: PhyloFacts orthology group prediction web server , 2009, Nucleic Acids Res..

[11]  Peter A. Meric,et al.  Lineage-Specific Biology Revealed by a Finished Genome Assembly of the Mouse , 2009, PLoS biology.

[12]  Gene Ontology Consortium The Gene Ontology (GO) database and informatics resource , 2003 .

[13]  T. Hunter,et al.  The protein kinases of budding yeast: six score and more. , 1997, Trends in biochemical sciences.

[14]  Tatiana A. Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[15]  Ralf Reski,et al.  An ancient genome duplication contributed to the abundance of metabolic genes in the moss Physcomitrella patens , 2007, BMC Evolutionary Biology.

[16]  M. Kasahara,et al.  The 2R hypothesis: an update. , 2007, Current opinion in immunology.

[17]  S. Pongor,et al.  The quest for orthologs: finding the corresponding gene across genomes. , 2008, Trends in genetics : TIG.

[18]  T. Hunter,et al.  Evolution of protein kinase signaling from yeast to man. , 2002, Trends in biochemical sciences.

[19]  T. Hunter,et al.  The Protein Kinase Complement of the Human Genome , 2002, Science.

[20]  Damian Szklarczyk,et al.  eggNOG v2.0: extending the evolutionary genealogy of genes with enhanced non-supervised orthologous groups, species and functional annotations , 2009, Nucleic Acids Res..

[21]  J. Dopazo,et al.  The human phylome , 2007, Genome Biology.

[22]  Duncan P. Brown,et al.  Functional Classification Using Phylogenomic Inference , 2006, PLoS Comput. Biol..

[23]  Evgeny M. Zdobnov,et al.  OrthoDB: the hierarchical catalog of eukaryotic orthologs , 2007, Nucleic Acids Res..

[24]  Andreas Prlic,et al.  Ensembl 2007 , 2006, Nucleic Acids Res..

[25]  E. Koonin Orthologs, paralogs, and evolutionary genomics. , 2005, Annual review of genetics.

[26]  Christophe Dessimoz,et al.  Phylogenetic and Functional Assessment of Orthologs Inference Projects and Methods , 2009, PLoS Comput. Biol..

[27]  Dominique Lavenier,et al.  PLAST: parallel local alignment search tool for database comparison , 2009, BMC Bioinformatics.

[28]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[29]  M. Huynen,et al.  Benchmarking ortholog identification methods using functional genomics data , 2006, Genome Biology.

[30]  Gaston H. Gonnet,et al.  Algorithm of OMA for large-scale orthology inference , 2008, BMC Bioinformatics.

[31]  Ikuo Uchiyama,et al.  MBGD: a platform for microbial comparative genomics based on the automated construction of orthologous groups , 2006, Nucleic Acids Res..

[32]  BMC Bioinformatics , 2005 .

[33]  A. Hughes,et al.  Phylogenetic tests of the hypothesis of block duplication of homologous genes on human chromosomes 6, 9, and 1. , 1998, Molecular biology and evolution.

[34]  Olivier Poch,et al.  PtdIns5P regulation through evolution: roles in membrane trafficking? , 2008, Trends in biochemical sciences.

[35]  A. Sali,et al.  Evolutionary constraints on structural similarity in orthologs and paralogs , 2009, Protein science : a publication of the Protein Society.

[36]  Wu-chun Feng,et al.  Semantics-based distributed I/O for mpiBLAST , 2008, PPOPP.

[37]  Erik L. L. Sonnhammer,et al.  InParanoid 7: new algorithms and tools for eukaryotic orthology analysis , 2009, Nucleic Acids Res..

[38]  Robert D. Finn,et al.  InterPro: the integrative protein signature database , 2008, Nucleic Acids Res..

[39]  T. Hunter,et al.  The protein kinases of Caenorhabditis elegans: a model for signal transduction in multicellular organisms. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[40]  Darren A. Natale,et al.  The COG database: an updated version includes eukaryotes , 2003, BMC Bioinformatics.

[41]  Karen E. Pilcher,et al.  The Dictyostelium Kinome—Analysis of the Protein Kinases from a Simple Model Organism , 2006, PLoS genetics.

[42]  T. Hunter,et al.  The mouse kinome: discovery and comparative genomics of all mouse protein kinases. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[43]  W. Fitch Distinguishing homologous from analogous proteins. , 1970, Systematic zoology.

[44]  Eric Depiereux,et al.  2× genomes - depth does matter , 2010, Genome Biology.

[45]  Charles E. Chapple,et al.  Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype , 2004, Nature.