Assembly rules for protein networks derived from phylogenetic-statistical analysis of whole genomes

BackgroundWe report an analysis of a protein network of functionally linked proteins, identified from a phylogenetic statistical analysis of complete eukaryotic genomes. Phylogenetic methods identify pairs of proteins that co-evolve on a phylogenetic tree, and have been shown to have a high probability of correctly identifying known functional links.ResultsThe eukaryotic correlated evolution network we derive displays the familiar power law scaling of connectivity. We introduce the use of explicit phylogenetic methods to reconstruct the ancestral presence or absence of proteins at the interior nodes of a phylogeny of eukaryote species. We find that the connectivity distribution of proteins at the point they arise on the tree and join the network follows a power law, as does the connectivity distribution of proteins at the time they are lost from the network. Proteins resident in the network acquire connections over time, but we find no evidence that 'preferential attachment' – the phenomenon of newly acquired connections in the network being more likely to be made to proteins with large numbers of connections – influences the network structure. We derive a 'variable rate of attachment' model in which proteins vary in their propensity to form network interactions independently of how many connections they have or of the total number of connections in the network, and show how this model can produce apparent power-law scaling without preferential attachment.ConclusionA few simple rules can explain the topological structure and evolutionary changes to protein-interaction networks: most change is concentrated in satellite proteins of low connectivity and small phenotypic effect, and proteins differ in their propensity to form attachments. Given these rules of assembly, power law scaled networks naturally emerge from simple principles of selection, yielding protein interaction networks that retain a high-degree of robustness on short time scales and evolvability on longer evolutionary time scales.

[1]  D. Graur,et al.  Reading the entrails of chickens: molecular timescales of evolution and the illusion of precision. , 2004, Trends in genetics : TIG.

[2]  D. M. Krylov,et al.  Gene loss, protein sequence divergence, gene dispensability, expression level, and interactivity are correlated in eukaryotic evolution. , 2003, Genome research.

[3]  M. Pagel Detecting correlated evolution on phylogenies: a general method for the comparative analysis of discrete characters , 1994, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[4]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[5]  Albert-László Barabási,et al.  Linked: The New Science of Networks , 2002 .

[6]  Eugene V Koonin,et al.  Correction: No simple dependence between protein evolution rate and the number of protein-protein interactions: only the most prolific interactors tend to evolve slowly , 2003, BMC Evolutionary Biology.

[7]  M. Pagel,et al.  Bayesian Analysis of Correlated Evolution of Discrete Characters by Reversible‐Jump Markov Chain Monte Carlo , 2006, The American Naturalist.

[8]  Reka Albert,et al.  Mean-field theory for scale-free random networks , 1999 .

[9]  Dmitrij Frishman,et al.  MIPS: a database for genomes and protein sequences , 1999, Nucleic Acids Res..

[10]  D. Eisenberg,et al.  Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Mark Pagel,et al.  Predicting Functional Gene Links from Phylogenetic-Statistical Analyses of Whole Genomes , 2005, 2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05).

[12]  Ronald W. Davis,et al.  Functional profiling of the Saccharomyces cerevisiae genome , 2002, Nature.

[13]  M. Pagel,et al.  A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data. , 2004, Systematic biology.

[14]  M. Pagel Inferring the historical patterns of biological evolution , 1999, Nature.

[15]  R. Albert,et al.  The large-scale organization of metabolic networks , 2000, Nature.

[16]  M. Pagel,et al.  Developmental stability and signalling among cells. , 1998, Journal of theoretical biology.

[17]  S. Redner How popular is your paper? An empirical study of the citation distribution , 1998, cond-mat/9804163.

[18]  A. Wagner Distributed robustness versus redundancy as causes of mutational robustness. , 2005, BioEssays : news and reviews in molecular, cellular and developmental biology.

[19]  R. Punnett,et al.  The Genetical Theory of Natural Selection , 1930, Nature.

[20]  R. Goldstein Emergent Robustness in Competition Between Autocatalytic Chemical Networks , 2006, Origins of Life and Evolution of Biospheres.

[21]  A. Wagner How the global structure of protein interaction networks evolves , 2002, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[22]  David J. Galas,et al.  A duplication growth model of gene expression networks , 2002, Bioinform..

[23]  G. Odell,et al.  The segment polarity network is a robust developmental module , 2000, Nature.

[24]  Albert-László Barabási,et al.  Error and attack tolerance of complex networks , 2000, Nature.

[25]  Albert-László Barabási,et al.  Internet: Diameter of the World-Wide Web , 1999, Nature.

[26]  Hidemi Watanabe,et al.  A genomic timescale for the origin of eukaryotes , 2001, BMC Evolutionary Biology.

[27]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[28]  M. Pagel The Maximum Likelihood Approach to Reconstructing Ancestral Character States of Discrete Characters on Phylogenies , 1999 .

[29]  A. E. Hirsh,et al.  Evolutionary Rate in the Protein Interaction Network , 2002, Science.