PhyloPro2.0: a database for the dynamic exploration of phylogenetically conserved proteins and their domain architectures across the Eukarya

PhyloPro is a database and accompanying web-based application for the construction and exploration of phylogenetic profiles across the Eukarya. In this update article, we present six major new developments in PhyloPro: (i) integration of Pfam-A domain predictions for all proteins; (ii) new summary heatmaps and detailed level views of domain conservation; (iii) an interactive, network-based visualization tool for exploration of domain architectures and their conservation; (iv) ability to browse based on protein functional categories (GOSlim); (v) improvements to the web interface to enhance drill down capability from the heatmap view; and (vi) improved coverage including 164 eukaryotes and 12 reference species. In addition, we provide improved support for downloading data and images in a variety of formats. Among the existing tools available for phylogenetic profiles, PhyloPro provides several innovative domain-based features including a novel domain adjacency visualization tool. These are designed to allow the user to identify and compare proteins with similar domain architectures across species and thus develop hypotheses about the evolution of lineage-specific trajectories. Database URL: http://www.compsysbio.org/phylopro/

[1]  Evgeny M. Zdobnov,et al.  BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs , 2015, Bioinform..

[2]  Huaiyu Mi,et al.  The InterPro protein families database: the classification resource after 15 years , 2014, Nucleic Acids Res..

[3]  Manolis Kellis,et al.  Evolution at the Subgene Level: Domain Rearrangements in the Drosophila Phylogeny , 2011, Molecular biology and evolution.

[4]  Davide Heller,et al.  STRING v10: protein–protein interaction networks, integrated over the tree of life , 2014, Nucleic Acids Res..

[5]  Albert J. Vilella,et al.  EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. , 2009, Genome research.

[6]  Valentín Ruano-Rubio,et al.  Comparison of eukaryotic phylogenetic profiling approaches using species tree aware methods , 2009, BMC Bioinformatics.

[7]  S. Pongor,et al.  The quest for orthologs: finding the corresponding gene across genomes. , 2008, Trends in genetics : TIG.

[8]  B. Birren,et al.  Genome Project Standards in a New Era of Sequencing , 2009, Science.

[9]  S Uliel,et al.  Naturally occurring circular permutations in proteins. , 2001, Protein engineering.

[10]  Malay Kumar Basu,et al.  Domain mobility in proteins: functional and evolutionary implications , 2008, Briefings Bioinform..

[11]  D. Eisenberg,et al.  Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Alex Bateman,et al.  TreeFam v9: a new website, more species and orthology-on-the-fly , 2013, Nucleic Acids Res..

[13]  C. Ponting,et al.  Genome assembly quality: assessment and improvement using the neutral indel model. , 2010, Genome research.

[14]  Damian Szklarczyk,et al.  eggNOG v4.0: nested orthology inference across 3686 organisms , 2013, Nucleic Acids Res..

[15]  Maria Jesus Martin,et al.  Big data and other challenges in the quest for orthologs , 2014, Bioinform..

[16]  Peer Bork,et al.  SMART: recent updates, new developments and status in 2015 , 2014, Nucleic Acids Res..

[17]  Satoru Miyano,et al.  Open source clustering software , 2004 .

[18]  Salvador Capella-Gutiérrez,et al.  PhylomeDB v4: zooming into the plurality of evolutionary histories of a genome , 2013, Nucleic Acids Res..

[19]  Pietro Liò,et al.  The BioMart community portal: an innovative alternative to large, centralized data repositories , 2015, Nucleic Acids Res..

[20]  Javier Herrero,et al.  Toward community standards in the quest for orthologs , 2012, Bioinform..

[21]  Feng Chen,et al.  OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups , 2005, Nucleic Acids Res..

[22]  Tsviya Olender,et al.  Human Gene-Centric Databases at the Weizmann Institute of Science: GeneCards, UDB, CroW 21 and HORDE , 2003, Nucleic Acids Res..

[23]  Erich Bornberg-Bauer,et al.  Dynamics and adaptive benefits of modular protein evolution. , 2013, Current opinion in structural biology.

[24]  Leszek P. Pryszcz,et al.  MetaPhOrs: orthology and paralogy predictions from multiple phylogenetic evidence using a consistency-based confidence score , 2010, Nucleic acids research.

[25]  Ron Unger,et al.  Swaps in protein sequences , 2002, Proteins.

[26]  John Parkinson,et al.  Evolution and architecture of the inner membrane complex in asexual and sexual stages of the malaria parasite. , 2012, Molecular biology and evolution.

[27]  Zhaolei Zhang,et al.  New Tricks for “Old” Domains: How Novel Architectures and Promiscuous Hubs Contributed to the Organization and Evolution of the ECM , 2014, Genome biology and evolution.

[28]  E. Sonnhammer,et al.  Evolution of protein domain architectures. , 2012, Methods in molecular biology.

[29]  E. Koonin,et al.  Evolution of protein domain promiscuity in eukaryotes. , 2008, Genome research.

[30]  E. Koonin,et al.  Functional and evolutionary implications of gene orthology , 2013, Nature Reviews Genetics.

[31]  Christian E. V. Storm,et al.  Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. , 2001, Journal of molecular biology.

[32]  Stephen H. Bryant,et al.  Domain size distributions can predict domain boundaries , 2000, Bioinform..

[33]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[34]  Lucas Lochovsky,et al.  PhyloPro: a web-based tool for the generation and visualization of phylogenetic profiles across Eukarya , 2011, Bioinform..

[35]  Albert J. Vilella,et al.  Joining forces in the quest for orthologs , 2009, Genome Biology.

[36]  Darren A. Natale,et al.  The COG database: an updated version includes eukaryotes , 2003, BMC Bioinformatics.

[37]  S Miyano,et al.  Open source clustering software. , 2004, Bioinformatics.

[38]  Gaston H. Gonnet,et al.  The OMA orthology database in 2015: function predictions, better plant support, synteny view and other improvements , 2014, Nucleic Acids Res..

[39]  Sylvie Ricard-Blum,et al.  Toward a systems level view of the ECM and related proteins: A framework for the systematic definition and analysis of biological systems , 2012, Proteins.

[40]  Andrei L. Turinsky,et al.  The evolutionary landscape of the chromatin modification machinery reveals lineage specific gains, expansions, and losses , 2010, Proteins.

[41]  Kimmen Sjölander,et al.  Ortholog identification in the presence of domain architecture rearrangement , 2011, Briefings Bioinform..

[42]  Yiming Cheng,et al.  ProtPhylo: identification of protein–phenotype and protein–protein functional associations via phylogenetic profiling , 2015, Nucleic Acids Res..

[43]  Tao Liu,et al.  TreeFam: 2008 Update , 2007, Nucleic Acids Res..

[44]  Erich Bornberg-Bauer,et al.  Domain similarity based orthology detection , 2015, BMC Bioinformatics.

[45]  Narmada Thanki,et al.  CDD: NCBI's conserved domain database , 2014, Nucleic Acids Res..

[46]  N. Friedman,et al.  Natural history and evolutionary principles of gene duplication in fungi , 2007, Nature.

[47]  Dannie Durand,et al.  Domain Architecture Comparison for Multidomain Homology Identification , 2007, J. Comput. Biol..