A complex-centric view of protein network evolution

The recent availability of protein–protein interaction networks for several species makes it possible to study protein complexes in an evolutionary context. In this article, we present a novel network-based framework for reconstructing the evolutionary history of protein complexes. Our analysis is based on generalizing evolutionary measures for single proteins to the level of whole subnetworks, comprehensively considering a broad set of computationally derived complexes and accounting for both sequence and interaction changes. Specifically, we compute sets of orthologous complexes across species, and use these to derive evolutionary rate and age measures for protein complexes. We observe significant correlations between the evolutionary properties of a complex and those of its member proteins, suggesting that protein complexes form early in evolution and evolve as coherent units. Additionally, our approach enables us to directly quantify the extent to which gene duplication has played a role in the evolution of complexes. We find that about one quarter of the sets of orthologous complexes have originated from evolutionary cores of homodimers that underwent duplication and divergence, testifying to the important role of gene duplication in protein complex evolution.

[1]  W. Harkness Properties of the extended hypergeometric distribution , 1965 .

[2]  W. Fitch,et al.  Construction of phylogenetic trees. , 1967, Science.

[3]  References , 1971 .

[4]  J. Farris Phylogenetic Analysis Under Dollo's Law , 1977 .

[5]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[6]  Ira Herskowitz,et al.  A regulatory hierarchy for cell specialization in yeast , 1989, Nature.

[7]  P. Forterre,et al.  Universal tree of life , 1993, Nature.

[8]  E. Craig,et al.  Heat-shock proteins as molecular chaperones. , 1994, European journal of biochemistry.

[9]  A. Ciechanover,et al.  Protein synthesis elongation factor EF-1 alpha is essential for ubiquitin-dependent degradation of certain N alpha-acetylated proteins and may be substituted for by the bacterial elongation factor EF-Tu. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[11]  R F Doolittle,et al.  Determining divergence times with a protein clock: update and reevaluation. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[12]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.

[13]  Mark Borodovsky,et al.  The complete genome sequence of the gastric pathogen Helicobacter pylori , 1997, Nature.

[14]  C. Chothia,et al.  Structural assignments to the Mycoplasma genitalium proteins show extensive gene duplications and domain rearrangements. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[15]  D. Whelan,et al.  THE PROMISE ( AND PERIL ) , 2017 .

[16]  D. Eisenberg,et al.  Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[17]  T Horiuchi,et al.  Functional genomics of Escherichia coli in Japan. , 2000, Research in microbiology.

[18]  C. Wolberger,et al.  Characterization of the N-terminal Domain of the Yeast Transcriptional Repressor Tup1 , 2000, The Journal of Biological Chemistry.

[19]  M. Vidal,et al.  Identification of potential interaction networks using sequence-based searches for conserved protein-protein interactions or "interologs". , 2001, Genome research.

[20]  Christian E. V. Storm,et al.  Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. , 2001, Journal of molecular biology.

[21]  Thijs J. G. Ettema,et al.  Modularity in the gain and loss of genes: applications for function prediction. , 2001, Trends in genetics : TIG.

[22]  J. Wojcik,et al.  The protein–protein interaction map of Helicobacter pylori , 2001, Nature.

[23]  Andrew A. Peden,et al.  A genomic perspective on membrane compartment organization , 2001, Nature.

[24]  A. Hughes,et al.  Pattern and timing of gene duplication in animal genomes. , 2001, Genome research.

[25]  Gary D Bader,et al.  Analyzing yeast protein–protein interaction data obtained from different sources , 2002, Nature Biotechnology.

[26]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..

[27]  J. Szustakowski,et al.  Computational identification of operons in microbial genomes. , 2002, Genome research.

[28]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[29]  S Blair Hedges,et al.  BMC Evolutionary Biology BioMed Central , 2003 .

[30]  M. Fraunholz,et al.  PlasmoDB: exploring genomics and post-genomics data of the malaria parasite, Plasmodium falciparum , 2003, Redox report : communications in free radical research.

[31]  Gene Ontology Consortium The Gene Ontology (GO) database and informatics resource , 2003 .

[32]  R. Karp,et al.  Conserved pathways within bacteria and yeast as revealed by global protein network alignment , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Wen-Hsiung Li,et al.  Evolution of the yeast protein interaction network , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Jodie J. Yin,et al.  A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes , 2004, Genome Biology.

[35]  R. Aebersold,et al.  Mass spectrometry-based proteomics , 2003, Nature.

[36]  D. M. Krylov,et al.  Gene loss, protein sequence divergence, gene dispensability, expression level, and interactivity are correlated in eukaryotic evolution. , 2003, Genome research.

[37]  The FlyBase database of the Drosophila genome projects and community literature. , 2003, Nucleic acids research.

[38]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology , 2003, Nucleic Acids Res..

[39]  Hanno Steen,et al.  Development of human protein reference database as an initial platform for approaching systems biology in humans. , 2003, Genome research.

[40]  Dmitrij Frishman,et al.  MIPS: analysis and annotation of proteins from whole genomes in 2005 , 2005, Nucleic Acids Res..

[41]  S. L. Wong,et al.  A Map of the Interactome Network of the Metazoan C. elegans , 2004, Science.

[42]  B. Chait,et al.  Components of Coated Vesicles and Nuclear Pore Complexes Share a Common Molecular Architecture , 2004, PLoS biology.

[43]  Berend Snel,et al.  Quantifying modularity in the evolution of biomolecular systems. , 2004, Genome research.

[44]  M. P. Cummings PHYLIP (Phylogeny Inference Package) , 2004 .

[45]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[46]  J. Rothberg,et al.  Gaining confidence in high-throughput protein interaction networks , 2004, Nature Biotechnology.

[47]  Kara Dolinski,et al.  Saccharomyces Genome Database (SGD) provides tools to identify and analyze sequences from Saccharomyces cerevisiae and related sequences from other organisms , 2004, Nucleic Acids Res..

[48]  Bernardo A Mangiola,et al.  A Drosophila protein-interaction map centered on cell-cycle regulators , 2004, Genome Biology.

[49]  S. Fields High‐throughput two‐hybrid analysis , 2005, The FEBS journal.

[50]  Sarah A Teichmann,et al.  Novel specificities emerge by stepwise duplication of functional modules. , 2005, Genome research.

[51]  A. Vershon,et al.  N-Terminal Arm of Mcm1 Is Required for Transcription of a Subset of Genes Involved in Maintenance of the Cell Wall , 2005, Eukaryotic Cell.

[52]  H. Lehrach,et al.  A Human Protein-Protein Interaction Network: A Resource for Annotating the Proteome , 2005, Cell.

[53]  R. Karp,et al.  From the Cover : Conserved patterns of protein interaction in multiple species , 2005 .

[54]  T. Sittler,et al.  The Plasmodium protein network diverges from those of other eukaryotes , 2005, Nature.

[55]  M. Vignali,et al.  A protein interaction network of the malaria parasite Plasmodium falciparum , 2005, Nature.

[56]  Kimberly Van Auken,et al.  WormBase: a comprehensive data resource for Caenorhabditis biology and genomics , 2004, Nucleic Acids Res..

[57]  J. Bähler Cell-cycle control of gene expression in budding and fission yeast. , 2005, Annual review of genetics.

[58]  S. L. Wong,et al.  Towards a proteome-scale map of the human protein–protein interaction network , 2005, Nature.

[59]  S. Kanaya,et al.  Large-scale identification of protein-protein interaction of Escherichia coli K-12. , 2006, Genome research.

[60]  P. Bork,et al.  Identification and analysis of evolutionarily cohesive functional modules in protein networks. , 2006, Genome research.

[61]  T. Ideker,et al.  Supporting Online Material for A Systems Approach to Mapping DNA Damage Response Pathways , 2006 .

[62]  T. Ideker,et al.  Modeling cellular machinery through biological network comparison , 2006, Nature Biotechnology.

[63]  T. Ideker,et al.  Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae , 2006, Journal of biology.

[64]  Sarah A Teichmann,et al.  Evolution of protein complexes by duplication of homomeric interactions , 2007, Genome Biology.

[65]  Antal F. Novak,et al.  networks Græmlin : General and robust alignment of multiple large interaction data , 2006 .

[66]  Eugene V. Koonin,et al.  Power Laws, Scale-Free Networks and Genome Biology , 2006 .

[67]  Jacques van Helden,et al.  Evaluation of clustering algorithms for protein-protein interaction networks , 2006, BMC Bioinformatics.

[68]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[69]  J. S. Bader The Drosophila Protein Interaction Network May Be neither Power-Law nor Scale-Free , 2006 .

[70]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[71]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[72]  Roded Sharan,et al.  Identification of conserved protein complexes based on a model of protein network evolution , 2007, Bioinform..

[73]  Tatiana A. Tatusova,et al.  Entrez Gene: gene-centered information at NCBI , 2004, Nucleic Acids Res..

[74]  Y. Zhang,et al.  IntAct—open source resource for molecular interaction data , 2006, Nucleic Acids Res..

[75]  Roded Sharan,et al.  Gene loss rate: a probabilistic measure for the conservation of eukaryotic genes , 2006, Nucleic acids research.

[76]  M. Kimmel,et al.  Conflict of interest statement. None declared. , 2010 .

[77]  G. Hong,et al.  Nucleic Acids Research , 2015, Nucleic Acids Research.