Sequence space and the ongoing expansion of the protein universe

The need to maintain the structural and functional integrity of an evolving protein severely restricts the repertoire of acceptable amino-acid substitutions. However, it is not known whether these restrictions impose a global limit on how far homologous protein sequences can diverge from each other. Here we explore the limits of protein evolution using sequence divergence data. We formulate a computational approach to study the rate of divergence of distant protein sequences and measure this rate for ancient proteins, those that were present in the last universal common ancestor. We show that ancient proteins are still diverging from each other, indicating an ongoing expansion of the protein sequence universe. The slow rate of this divergence is imposed by the sparseness of functional protein sequences in sequence space and the ruggedness of the protein fitness landscape: ∼98 per cent of sites cannot accept an amino-acid substitution at any given moment but a vast majority of all sites may eventually be permitted to evolve when other, compensatory, changes occur. Thus, ∼3.5 × 109 yr has not been enough to reach the limit of divergent evolution of proteins, and for most proteins the limit of sequence similarity imposed by common function may not exceed that of random sequences.

[1]  Nigel F. Delaney,et al.  Darwinian Evolution Can Follow Only Very Few Mutational Paths to Fitter Proteins , 2006, Science.

[2]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[3]  Michael Y. Galperin,et al.  Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes , 2003, BMC Evolutionary Biology.

[4]  Donald Hilvert,et al.  Searching sequence space for protein catalysts , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Brian Golding,et al.  A maximum likelihood approach to the detection of selection from a phylogeny , 1990, Journal of Molecular Evolution.

[6]  Eugene I Shakhnovich,et al.  Expanding protein universe and its origin from the biological Big Bang , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[7]  M. DePristo,et al.  Missense meanderings in sequence space: a biophysical view of protein evolution , 2005, Nature Reviews Genetics.

[8]  E. Koonin,et al.  A universal trend of amino acid gain and loss in protein evolution , 2005, Nature.

[9]  J. Petrosino,et al.  Amino acid sequence determinants of beta-lactamase structure and activity. , 1996, Journal of molecular biology.

[10]  John Maynard Smith,et al.  Natural Selection and the Concept of a Protein Space , 1970, Nature.

[11]  E. Koonin,et al.  The structure of the protein universe and genome evolution , 2002, Nature.

[12]  Ziheng Yang,et al.  Phylogenetic Analysis by Maximum Likelihood (PAML) , 2002 .

[13]  W. Lim,et al.  Deciphering the message in protein sequences: tolerance to amino acid substitutions. , 1990, Science.

[14]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.

[15]  E. Koonin,et al.  Trends in protein evolution inferred from sequence and structure analysis. , 2002, Current opinion in structural biology.

[16]  Juno Choe,et al.  Protein tolerance to random amino acid change. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Richard A Goldstein,et al.  Observations of amino acid gain and loss during protein evolution are explained by statistical bias. , 2006, Molecular biology and evolution.

[18]  W. Doolittle,et al.  The nature of the universal ancestor and the evolution of the proteome. , 2000, Current opinion in structural biology.

[19]  D. J. Kiviet,et al.  Empirical fitness landscapes reveal accessible evolutionary paths , 2007, Nature.

[20]  L Holm,et al.  Towards a covering set of protein family profiles. , 2000, Progress in biophysics and molecular biology.

[21]  A. Mazure,et al.  A test of the nature of cosmic acceleration using galaxy redshift distortions , 2008, Nature.

[22]  E. Hubble A RELATION BETWEEN DISTANCE AND RADIAL VELOCITY AMONG EXTRA-GALACTIC NEBULAE. , 1929, Proceedings of the National Academy of Sciences of the United States of America.

[23]  S. Sunyaev,et al.  Dobzhansky–Muller incompatibilities in protein evolution , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[24]  C. Orengo,et al.  Protein Superfamily Evolution and the Last Universal Common Ancestor (LUCA) , 2006, Journal of Molecular Evolution.

[25]  Toni Gabaldón,et al.  trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses , 2009, Bioinform..

[26]  H. Girardey,et al.  Trajectories , 2009, Handbook of Critical Agrarian Studies.

[27]  A. Kondrashov,et al.  Rate of sequence divergence under constant selection , 2010, Biology Direct.

[28]  Dan S. Tawfik,et al.  Stability effects of mutations and protein evolvability. , 2009, Current opinion in structural biology.

[29]  John P. Huelsenbeck,et al.  MrBayes 3: Bayesian phylogenetic inference under mixed models , 2003, Bioinform..

[30]  C Sander,et al.  Mapping the Protein Universe , 1996, Science.

[31]  Peter S. Shenkin,et al.  Amino Acid Sequence Determinants of β-Lactamase Structure and Activity , 1996 .

[32]  Manel Camps,et al.  Genetic Constraints on Protein Evolution , 2007, Critical reviews in biochemistry and molecular biology.

[33]  A. Kondrashov,et al.  Multidimensional epistasis and the disadvantage of sex , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[34]  R. Watson,et al.  PERSPECTIVE: SIGN EPISTASIS AND GENETIC COSTRAINT ON EVOLUTIONARY TRAJECTORIES , 2005, Evolution; international journal of organic evolution.

[35]  A. Lesk,et al.  How different amino acid sequences determine similar protein structures: the structure and evolutionary dynamics of the globins. , 1980, Journal of molecular biology.

[36]  Ziheng Yang PAML 4: phylogenetic analysis by maximum likelihood. , 2007, Molecular biology and evolution.

[37]  Inna Dubchak,et al.  ATGC: a database of orthologous genes from closely related prokaryotic genomes and a research platform for microevolution of prokaryotes , 2008, Nucleic Acids Res..

[38]  Eugene V. Koonin,et al.  Comparative genomics, minimal gene-sets and the last universal common ancestor , 2003, Nature Reviews Microbiology.