Orthology prediction methods: A quality assessment using curated protein families

The increasing number of sequenced genomes has prompted the development of several automated orthology prediction methods. Tests to evaluate the accuracy of predictions and to explore biases caused by biological and technical factors are therefore required. We used 70 manually curated families to analyze the performance of five public methods in Metazoa. We analyzed the strengths and weaknesses of the methods and quantified the impact of biological and technical challenges. From the latter part of the analysis, genome annotation emerged as the largest single influencer, affecting up to 30% of the performance. Generally, most methods did well in assigning orthologous group but they failed to assign the exact number of genes for half of the groups. The publicly available benchmark set (http://eggnog.embl.de/orthobench/) should facilitate the improvement of current orthology assignment protocols, which is of utmost importance for many fields of biology and should be tackled by a broad scientific community.

[1]  Arcady R. Mushegian,et al.  Computational methods for Gene Orthology inference , 2011, Briefings Bioinform..

[2]  A. Rokas,et al.  Evaluating Ortholog Prediction Algorithms in a Yeast Model Clade , 2011, PloS one.

[3]  Leszek P. Pryszcz,et al.  MetaPhOrs: orthology and paralogy predictions from multiple phylogenetic evidence using a consistency-based confidence score , 2010, Nucleic acids research.

[4]  Gaston H. Gonnet,et al.  OMA 2011: orthology inference among 1000 complete genomes , 2010, Nucleic Acids Res..

[5]  Salvador Capella-Gutiérrez,et al.  PhylomeDB v3.0: an expanding repository of genome-wide collections of trees, alignments and phylogeny-based orthology and paralogy predictions , 2010, Nucleic Acids Res..

[6]  Daniel Rios,et al.  Ensembl 2011 , 2010, Nucleic Acids Res..

[7]  Evgeny M. Zdobnov,et al.  OrthoDB: the hierarchical catalog of eukaryotic orthologs in 2011 , 2010, Nucleic Acids Res..

[8]  Michael Y. Galperin,et al.  Sequence ― Evolution ― Function: Computational Approaches in Comparative Genomics , 2010 .

[9]  O. Gascuel,et al.  New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. , 2010, Systematic biology.

[10]  Benjamin M. Wheeler,et al.  The dynamic genome of Hydra , 2010, Nature.

[11]  Peer Bork,et al.  AQUA: automated quality improvement for multiple sequence alignments , 2010, Bioinform..

[12]  Damian Szklarczyk,et al.  eggNOG v2.0: extending the evolutionary genealogy of genes with enhanced non-supervised orthologous groups, species and functional annotations , 2009, Nucleic Acids Res..

[13]  Erik L. L. Sonnhammer,et al.  InParanoid 7: new algorithms and tools for eukaryotic orthology analysis , 2009, Nucleic Acids Res..

[14]  Eric Depiereux,et al.  2× genomes - depth does matter , 2010, Genome Biology.

[15]  Sean R Eddy,et al.  A new generation of homology search tools based on probabilistic inference. , 2009, Genome informatics. International Conference on Genome Informatics.

[16]  J. Lagergren,et al.  Simultaneous Bayesian gene tree reconstruction and reconciliation analysis , 2009, Proceedings of the National Academy of Sciences.

[17]  Christophe Dessimoz,et al.  Phylogenetic and Functional Assessment of Orthologs Inference Projects and Methods , 2009, PLoS Comput. Biol..

[18]  Peer Bork,et al.  SMART 6: recent updates and new developments , 2008, Nucleic Acids Res..

[19]  Peer Bork,et al.  Discovering Functional Novelty in Metagenomes: Examples from Light-Mediated Processes , 2008, Journal of bacteriology.

[20]  Albert J. Vilella,et al.  EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. , 2009, Genome research.

[21]  A. Rokas The origins of multicellularity and the early history of the genetic toolkit for animal development. , 2008, Annual review of genetics.

[22]  S. Pongor,et al.  The quest for orthologs: finding the corresponding gene across genomes. , 2008, Trends in genetics : TIG.

[23]  T. Gabaldón Large-scale assignment of orthology: back to phylogenetics? , 2008, Genome Biology.

[24]  Nicholas H. Putnam,et al.  The Trichoplax genome and the nature of placozoans , 2008, Nature.

[25]  Nicholas H. Putnam,et al.  The genome of the choanoflagellate Monosiga brevicollis and the origin of metazoans , 2008, Nature.

[26]  E. Sonnhammer,et al.  Domain tree-based analysis of protein architecture evolution. , 2008, Molecular biology and evolution.

[27]  Tao Liu,et al.  TreeFam: 2008 Update , 2007, Nucleic Acids Res..

[28]  Evgeny M. Zdobnov,et al.  OrthoDB: the hierarchical catalog of eukaryotic orthologs , 2007, Nucleic Acids Res..

[29]  Christian von Mering,et al.  eggNOG: automated construction and annotation of orthologous groups of genes , 2007, Nucleic Acids Res..

[30]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[31]  T. Samuelsson,et al.  Gel-forming mucins appeared early in metazoan evolution , 2007, Proceedings of the National Academy of Sciences.

[32]  Gaston H. Gonnet,et al.  OMA Browser - Exploring orthologous relations across 352 complete genomes , 2007, Bioinform..

[33]  Nicholas H. Putnam,et al.  Sea Anemone Genome Reveals Ancestral Eumetazoan Gene Repertoire and Genomic Organization , 2007, Science.

[34]  Avi Pfeffer,et al.  Automatic genome-wide reconstruction of phylogenetic gene trees , 2007, ISMB/ECCB.

[35]  J. Vermunt,et al.  Assessing Performance of Orthology Detection Strategies Applied to Eukaryotic Genomes , 2007, PloS one.

[36]  J. D. Funkhouser,et al.  Chitinase family GH18: evolutionary insights from the genomic history of a diverse protein family , 2007, BMC Evolutionary Biology.

[37]  Berend Snel,et al.  Orthology prediction at scalable resolution by phylogenetic tree analysis , 2007, BMC Bioinformatics.

[38]  Leo Goodstadt,et al.  Phylogenetic Reconstruction of Orthology, Paralogy, and Conserved Synteny for Dog and Human , 2006, PLoS Comput. Biol..

[39]  Leonid Peshkin,et al.  Roundup: a multi-genome repository of orthologs and evolutionary distances , 2006, Bioinform..

[40]  Feng Chen,et al.  OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups , 2005, Nucleic Acids Res..

[41]  M. Huynen,et al.  Benchmarking ortholog identification methods using functional genomics data , 2006, Genome Biology.

[42]  Olivier Poch,et al.  Sequence and Comparative Genomic Analysis of Actin-related Proteins □ D Sequence Searches and Alignment Sequence Analysis , 2022 .

[43]  E. Koonin Orthologs, Paralogs, and Evolutionary Genomics 1 , 2005 .

[44]  L. Patthy,et al.  Modules, multidomain proteins and organismic complexity , 2005, The FEBS journal.

[45]  Kevin P. Byrne,et al.  The Yeast Gene Order Browser: combining curated homology and syntenic context reveals gene fate in polyploid species. , 2005, Genome research.

[46]  M. Suyama,et al.  Complex genomic rearrangements lead to novel primate gene function. , 2005, Genome research.

[47]  E. Koonin Orthologs, paralogs, and evolutionary genomics. , 2005, Annual review of genetics.

[48]  M. Alexandersson,et al.  Bioinformatic identification of polymerizing and transmembrane mucins in the puffer fish Fugu rubripes. , 2004, Glycobiology.

[49]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[50]  Olivier Poch,et al.  RASCAL: Rapid Scanning and Correction of Multiple Sequence Alignments , 2003, Bioinform..

[51]  Michael Y. Galperin,et al.  Sequence — Evolution — Function , 2003, Springer US.

[52]  E. Koonin,et al.  Orthology, paralogy and proposed classification for paralog subtypes. , 2002, Trends in genetics : TIG.

[53]  Peer Bork,et al.  Genome and protein evolution in eukaryotes. , 2002, Current opinion in chemical biology.

[54]  J. D. Thompson,et al.  Towards a reliable objective function for multiple sequence alignments. , 2001, Journal of molecular biology.

[55]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[56]  Wei Qian,et al.  Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. , 2000, Molecular biology and evolution.

[57]  P. Bork,et al.  Measuring genome evolution. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[58]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.

[59]  R. Doolittle The origins and evolution of eukaryotic proteins. , 1995, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[60]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[61]  W. Fitch Distinguishing homologous from analogous proteins. , 1970, Systematic zoology.