Protein conformational diversity modulates sequence divergence.

It is well established that the conservation of protein structure during evolution constrains sequence divergence. The conservation of certain physicochemical environments to preserve protein folds and then the biological function originates a site-specific structurally constrained substitution pattern. However, protein native structure is not unique. It is known that the native state is better described by an ensemble of conformers in a dynamic equilibrium. In this work, we studied the influence of conformational diversity in sequence divergence and protein evolution. For this purpose, we derived a set of 900 proteins with different degrees of conformational diversity from the PCDB database, a conformer database. With the aid of a structurally constrained protein evolutionary model, we explored the influence of the different conformations on sequence divergence. We found that the presence of conformational diversity strongly modulates the substitution pattern. Although the conformers share several of the structurally constrained sites, 30% of them are conformer specific. Also, we found that in 76% of the proteins studied, a single conformer outperforms the others in the prediction of sequence divergence. It is interesting to note that this conformer is usually the one that binds ligands participating in the biological function of the protein. The existence of a conformer-specific site-substitution pattern indicates that conformational diversity could play a central role in modulating protein evolution. Furthermore, our findings suggest that new evolutionary models and bioinformatics tools should be developed taking into account this substitution bias.

[1]  O. Gascuel,et al.  An improved general amino acid replacement matrix. , 2008, Molecular biology and evolution.

[2]  H. Wolfson,et al.  Access the most recent version at doi: 10.1110/ps.21302 References , 2001 .

[3]  J. Onuchic,et al.  Funnels, pathways, and the energy landscape of protein folding: A synthesis , 1994, Proteins.

[4]  Julian Echave,et al.  Exploring the common dynamics of homologous proteins. Application to the globin family. , 2005, Biophysical journal.

[5]  P. Wolynes,et al.  Intermediates and barrier crossing in a random energy model , 1989 .

[6]  D. Lipman,et al.  Relative Contributions of Intrinsic Structural–Functional Constraints and Translation Rate to the Evolution of Protein-Coding Genes , 2010, Genome biology and evolution.

[7]  Oliver F. Lange,et al.  Recognition Dynamics Up to Microseconds Revealed from an RDC-Derived Ubiquitin Ensemble in Solution , 2008, Science.

[8]  M. Karplus,et al.  Molecular dynamics and protein function. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[9]  S. Whelan,et al.  A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. , 2001, Molecular biology and evolution.

[10]  A. D. McLachlan,et al.  Secondary structure‐based profiles: Use of structure‐conserving scoring tables in searching protein sequence databases for structural similarities , 1991, Proteins.

[11]  Janet M. Thornton,et al.  PROCOGNATE: a cognate ligand domain mapping for enzymes , 2007, Nucleic Acids Res..

[12]  C. Milstein,et al.  Conformational isomerism and the diversity of antibodies. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[13]  J. Echave,et al.  Quaternary structure constraints on evolutionary sequence divergence. , 2006, Molecular biology and evolution.

[14]  Chris Sander,et al.  The HSSP data base of protein structure-sequence alignments , 1993, Nucleic Acids Res..

[15]  J. Echave,et al.  The structurally constrained protein evolution model accounts for sequence patterns of the LβH superfamily , 2004, BMC Evolutionary Biology.

[16]  Francisco Melo,et al.  Effective knowledge‐based potentials , 2009, Protein science : a publication of the Protein Society.

[17]  J. Changeux,et al.  ON THE NATURE OF ALLOSTERIC TRANSITIONS: A PLAUSIBLE MODEL. , 1965, Journal of molecular biology.

[18]  Claus O. Wilke,et al.  Mistranslation-Induced Protein Misfolding as a Dominant Constraint on Coding-Sequence Evolution , 2008, Cell.

[19]  R. Jernigan,et al.  Proteins with similar architecture exhibit similar large-scale dynamic behavior. , 2000, Biophysical journal.

[20]  M Karplus,et al.  Relation between sequence and structure of HIV-1 protease inhibitor complexes: a model system for the analysis of protein flexibility. , 2002, Journal of molecular biology.

[21]  Sergei L. Kosakovsky Pond,et al.  HyPhy: hypothesis testing using phylogenies , 2005, Bioinform..

[22]  KharHengChoo,et al.  Recent Applications of Hidden Markov Models in Computational Biology , 2004 .

[23]  F. Young Biochemistry , 1955, The Indian Medical Gazette.

[24]  E. Fischer Einfluss der Configuration auf die Wirkung der Enzyme , 1894 .

[25]  Frances M. G. Pearl,et al.  The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution , 2006, Nucleic Acids Res..

[26]  R. Nussinov,et al.  Folding funnels, binding funnels, and protein function , 1999, Protein science : a publication of the Protein Society.

[27]  D E Wemmer,et al.  Two-state allosteric behavior in a single-domain signaling protein. , 2001, Science.

[28]  Boguslaw Stec,et al.  Sampling of the native conformational ensemble of myoglobin via structures in different crystalline environments , 2007, Proteins.

[29]  Peter G Wolynes,et al.  Localizing frustration in native proteins and protein assemblies , 2007, Proceedings of the National Academy of Sciences.

[30]  G. Ulrich Nienhaus,et al.  Multiplexed-Replica Exchange Molecular Dynamics with the UNRES Force-Field as an Effective Method for Exploring the Conformational Energy Landscape of Proteins. , 2006 .

[31]  Hervé Philippe,et al.  Statistical potentials for improved structurally constrained evolutionary models. , 2010, Molecular biology and evolution.

[32]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[33]  F. Karush Heterogeneity of the Binding Sites of Bovine Serum Albumin1 , 1950 .

[34]  William R. Taylor,et al.  The rapid generation of mutation data matrices from protein sequences , 1992, Comput. Appl. Biosci..

[35]  W. Lipscomb,et al.  Escherichia coli aspartate transcarbamoylase: the molecular basis for a concerted allosteric transition. , 1990, Trends in biochemical sciences.

[36]  John P. Overington Structural constraints on residue substitution. , 1992, Genetic engineering.

[37]  R A Goldstein,et al.  Context-dependent optimal substitution matrices. , 1995, Protein engineering.

[38]  M. DePristo,et al.  Relation between native ensembles and experimental structures of proteins. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[39]  E. Shakhnovich,et al.  Understanding hierarchical protein evolution from first principles. , 2001, Journal of molecular biology.

[40]  Gustavo D. Parisi,et al.  PCDB: a database of protein conformational diversity , 2010, Nucleic Acids Res..

[41]  A. Godzik,et al.  Global distribution of conformational states derived from redundant models in the PDB points to non-uniqueness of the protein structure , 2009, Proceedings of the National Academy of Sciences.

[42]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[43]  John P. Overington,et al.  Tertiary structural constraints on protein evolutionary diversity: templates, key residues and structure prediction , 1990, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[44]  Gaston H. Gonnet,et al.  Empirical codon substitution matrix , 2005, BMC Bioinformatics.

[45]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[46]  María Silvina Fornasari,et al.  Site-specific amino acid replacement matrices from structurally constrained protein evolution simulations. , 2002, Molecular biology and evolution.

[47]  Vincent J Hilser,et al.  An Ensemble View of Allostery , 2010, Science.

[48]  Federico Fogolari,et al.  Amino acid empirical contact energy definitions for fold recognition in the space of contact maps , 2003, BMC Bioinformatics.

[49]  Osvaldo Olmea,et al.  MAMMOTH (Matching molecular models obtained from theory): An automated method for model comparison , 2002, Protein science : a publication of the Protein Society.

[50]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[51]  Mark A. Wilson,et al.  Intrinsic motions along an enzymatic reaction trajectory , 2007, Nature.

[52]  Dan S. Tawfik,et al.  Protein Dynamism and Evolvability , 2009, Science.

[53]  J U Bowie,et al.  Three-dimensional profiles for analysing protein sequence-structure relationships. , 1992, Faraday discussions.

[54]  Dan S. Tawfik,et al.  Conformational diversity and protein evolution--a 60-year-old hypothesis revisited. , 2003, Trends in biochemical sciences.

[55]  M. Peruggia Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach (2nd ed.) , 2003 .

[56]  Fabrizio Chiti,et al.  Prevention of amyloid‐like aggregation as a driving force of protein evolution , 2007, EMBO reports.

[57]  Michael Gribskov,et al.  Profile scanning for three-dimensional structural patterns in protein sequences , 1988, Comput. Appl. Biosci..

[58]  Jens Meiler,et al.  A Correspondence Between Solution-State Dynamics of an Individual Protein and the Sequence and Conformational Diversity of its Family , 2009, PLoS Comput. Biol..

[59]  R Nussinov,et al.  Point mutations and sequence variability in proteins: Redistributions of preexisting populations , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[60]  Pavel I Zhuravlev,et al.  Protein functional landscapes, dynamics, allostery: a tortuous path towards a universal theoretical framework , 2010, Quarterly Reviews of Biophysics.

[61]  J. Echave,et al.  Evolutionary conservation of protein vibrational dynamics. , 2008, Gene.

[62]  J. Echave,et al.  Structural constraints and emergence of sequence patterns in protein evolution. , 2001, Molecular biology and evolution.

[63]  D. Eisenberg,et al.  Assessment of protein models with three-dimensional profiles , 1992, Nature.

[64]  R. Nussinov,et al.  Folding and binding cascades: Dynamic landscapes and population shifts , 2008, Protein science : a publication of the Protein Society.

[65]  R. Nussinov,et al.  Folding and binding cascades: shifts in energy landscapes. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[66]  A. Lesk,et al.  How different amino acid sequences determine similar protein structures: the structure and evolutionary dynamics of the globins. , 1980, Journal of molecular biology.

[67]  J. Echave,et al.  Generality of the structurally constrained protein evolution model: assessment on representatives of the four main fold classes. , 2005, Gene.

[68]  R. Russell,et al.  Analysis and prediction of functional sub-types from protein sequence alignments. , 2000, Journal of molecular biology.

[69]  A. Guzzo,et al.  The influence of amino-acid sequence on protein structure. , 1965, Biophysical journal.

[70]  M. Levitt Conformational preferences of amino acids in globular proteins. , 1978, Biochemistry.

[71]  C Sander,et al.  Predicting protein structure using hidden Markov models , 1997, Proteins.

[72]  D. Boehr,et al.  The Dynamic Energy Landscape of Dihydrofolate Reductase Catalysis , 2006, Science.

[73]  Erik L. L. Sonnhammer,et al.  FunShift: a database of function shift analysis on protein subfamilies , 2004, Nucleic Acids Res..

[74]  L. Kay,et al.  A solution NMR study showing that active site ligands and nucleotides directly perturb the allosteric equilibrium in aspartate transcarbamoylase , 2007, Proceedings of the National Academy of Sciences.

[75]  Michele Vendruscolo,et al.  Neutral evolution of model proteins: diffusion in sequence space and overdispersion. , 1998, Journal of theoretical biology.

[76]  J. Echave,et al.  Evolutionary Conservation of Protein Backbone Flexibility , 2006, Journal of Molecular Evolution.

[77]  J. Thornton,et al.  Conformational changes observed in enzyme crystal structures upon substrate binding. , 2005, Journal of molecular biology.

[78]  C. Wilke,et al.  Why highly expressed proteins evolve slowly. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[79]  A. Lesk,et al.  The relation between the divergence of sequence and structure in proteins. , 1986, The EMBO journal.

[80]  Frances H Arnold,et al.  Structural determinants of the rate of protein evolution in yeast. , 2006, Molecular biology and evolution.

[81]  D. Koshland,et al.  Protein structure and enzyme action. , 1958, Federation proceedings.

[82]  H. Akaike A new look at the statistical model identification , 1974 .

[83]  A. D. McLachlan,et al.  Profile analysis: detection of distantly related proteins. , 1987, Proceedings of the National Academy of Sciences of the United States of America.