eggNOG v3.0: orthologous groups covering 1133 organisms at 41 different taxonomic ranges

Orthologous relationships form the basis of most comparative genomic and metagenomic studies and are essential for proper phylogenetic and functional analyses. The third version of the eggNOG database (http://eggnog.embl.de) contains non-supervised orthologous groups constructed from 1133 organisms, doubling the number of genes with orthology assignment compared to eggNOG v2. The new release is the result of a number of improvements and expansions: (i) the underlying homology searches are now based on the SIMAP database; (ii) the orthologous groups have been extended to 41 levels of selected taxonomic ranges enabling much more fine-grained orthology assignments; and (iii) the newly designed web page is considerably faster with more functionality. In total, eggNOG v3 contains 721 801 orthologous groups, encompassing a total of 4 396 591 genes. Additionally, we updated 4873 and 4850 original COGs and KOGs, respectively, to include all 1133 organisms. At the universal level, covering all three domains of life, 101 208 orthologous groups are available, while the others are applicable at 40 more limited taxonomic ranges. Each group is amended by multiple sequence alignments and maximum-likelihood trees and broad functional descriptions are provided for 450 904 orthologous groups (62.5%).

[1]  B. Snel,et al.  Toward Automatic Reconstruction of a Highly Resolved Tree of Life , 2006, Science.

[2]  E. Koonin Orthologs, Paralogs, and Evolutionary Genomics 1 , 2005 .

[3]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[4]  Peer Bork,et al.  Interactive Tree Of Life v2: online annotation and display of phylogenetic trees made easy , 2011, Nucleic Acids Res..

[5]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[6]  W. Pearson Rapid and sensitive sequence comparison with FASTP and FASTA. , 1990, Methods in enzymology.

[7]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.

[8]  Gaston H. Gonnet,et al.  OMA 2011: orthology inference among 1000 complete genomes , 2010, Nucleic Acids Res..

[9]  Erik L. L. Sonnhammer,et al.  InParanoid 7: new algorithms and tools for eukaryotic orthology analysis , 2009, Nucleic Acids Res..

[10]  S. Pongor,et al.  The quest for orthologs: finding the corresponding gene across genomes. , 2008, Trends in genetics : TIG.

[11]  Arcady R. Mushegian,et al.  Computational methods for Gene Orthology inference , 2011, Briefings Bioinform..

[12]  Damian Szklarczyk,et al.  The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored , 2010, Nucleic Acids Res..

[13]  B. Snel,et al.  Function prediction and protein networks. , 2003, Current opinion in cell biology.

[14]  Stefan Götz,et al.  SIMAP—a comprehensive database of pre-calculated protein sequence similarities, domains, annotations and clusters , 2009, Nucleic Acids Res..

[15]  Tanya Z. Berardini,et al.  The Arabidopsis Information Resource (TAIR): gene structure and function annotation , 2007, Nucleic Acids Res..

[16]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[17]  Kimmen Sjölander,et al.  Berkeley PHOG: PhyloFacts orthology group prediction web server , 2009, Nucleic Acids Res..

[18]  T. Gabaldón Large-scale assignment of orthology: back to phylogenetics? , 2008, Genome Biology.

[19]  P. Bork,et al.  Orthology prediction methods: A quality assessment using curated protein families , 2011, BioEssays : news and reviews in molecular, cellular and developmental biology.

[20]  Tao Liu,et al.  TreeFam: 2008 Update , 2007, Nucleic Acids Res..

[21]  Sándor Pongor,et al.  ProGMap: an integrated annotation resource for protein orthology , 2009, Nucleic Acids Res..

[22]  Leszek P. Pryszcz,et al.  MetaPhOrs: orthology and paralogy predictions from multiple phylogenetic evidence using a consistency-based confidence score , 2010, Nucleic acids research.

[23]  M. Huynen,et al.  Benchmarking ortholog identification methods using functional genomics data , 2006, Genome Biology.

[24]  Albert J. Vilella,et al.  EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. , 2009, Genome research.

[25]  Daniel Rios,et al.  Ensembl 2011 , 2010, Nucleic Acids Res..

[26]  Damian Szklarczyk,et al.  STITCH 2: an interaction network database for small molecules and proteins , 2009, Nucleic Acids Res..

[27]  W. Fitch Distinguishing homologous from analogous proteins. , 1970, Systematic zoology.

[28]  Christian von Mering,et al.  eggNOG: automated construction and annotation of orthologous groups of genes , 2007, Nucleic Acids Res..

[29]  Eric Depiereux,et al.  2× genomes - depth does matter , 2010, Genome Biology.

[30]  Ioannis Xenarios,et al.  Conceptual framework and pilot study to benchmark phylogenomic databases based on reference gene trees , 2011, Briefings Bioinform..

[31]  Damian Szklarczyk,et al.  eggNOG v2.0: extending the evolutionary genealogy of genes with enhanced non-supervised orthologous groups, species and functional annotations , 2009, Nucleic Acids Res..

[32]  Joaquín Dopazo,et al.  PhylomeDB: a database for genome-wide collections of gene phylogenies , 2007, Nucleic Acids Res..

[33]  Evgeny M. Zdobnov,et al.  OrthoDB: the hierarchical catalog of eukaryotic orthologs , 2007, Nucleic Acids Res..

[34]  Peer Bork,et al.  SMART 6: recent updates and new developments , 2008, Nucleic Acids Res..

[35]  Avi Pfeffer,et al.  Automatic genome-wide reconstruction of phylogenetic gene trees , 2007, ISMB/ECCB.

[36]  E. Koonin Orthologs, paralogs, and evolutionary genomics. , 2005, Annual review of genetics.

[37]  Olivier Poch,et al.  OrthoInspector: comprehensive orthology analysis and visual exploration , 2011, BMC Bioinformatics.

[38]  Berend Snel,et al.  Keeping Afloat: A Strategy for Small Island Nations , 2005, BMC Bioinformatics.

[39]  Feng Chen,et al.  OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups , 2005, Nucleic Acids Res..

[40]  Peer Bork,et al.  Universally Distributed Single-Copy Genes Indicate a Constant Rate of Horizontal Transfer , 2011, PloS one.

[41]  Evgeny M. Zdobnov,et al.  OrthoDB: the hierarchical catalog of eukaryotic orthologs in 2011 , 2010, Nucleic Acids Res..

[42]  J A Eisen,et al.  Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis. , 1998, Genome research.

[43]  Darren A. Natale,et al.  The COG database: an updated version includes eukaryotes , 2003, BMC Bioinformatics.

[44]  E. Koonin,et al.  Clusters of orthologous genes for 41 archaeal genomes and implications for evolutionary genomics of archaea , 2007, Biology Direct.

[45]  Ikuo Uchiyama,et al.  MBGD: a platform for microbial comparative genomics based on the automated construction of orthologous groups , 2006, Nucleic Acids Res..

[46]  Eileen Kraemer,et al.  GiardiaDB and TrichDB: integrated genomic resources for the eukaryotic protist pathogens Giardia lamblia and Trichomonas vaginalis , 2008, Nucleic Acids Res..

[47]  Christophe Dessimoz,et al.  Phylogenetic and Functional Assessment of Orthologs Inference Projects and Methods , 2009, PLoS Comput. Biol..

[48]  María Martín,et al.  Ongoing and future developments at the Universal Protein Resource , 2010, Nucleic Acids Res..

[49]  Geoffrey J. Barton,et al.  Jalview Version 2—a multiple sequence alignment editor and analysis workbench , 2009, Bioinform..

[50]  Christian von Mering,et al.  STRING: known and predicted protein–protein associations, integrated and transferred across organisms , 2004, Nucleic Acids Res..