Local Packing Density Is the Main Structural Determinant of the Rate of Protein Sequence Evolution at Site Level

Functional and biophysical constraints result in site-dependent patterns of protein sequence variability. It is commonly assumed that the key structural determinant of site-specific rates of evolution is the Relative Solvent Accessibility (RSA). However, a recent study found that amino acid substitution rates correlate better with two Local Packing Density (LPD) measures, the Weighted Contact Number (WCN) and the Contact Number (CN), than with RSA. This work aims at a more thorough assessment. To this end, in addition to substitution rates, we considered four other sequence variability scores, four measures of solvent accessibility (SA), and other CN measures. We compared all properties for each protein of a structurally and functionally diverse representative dataset of monomeric enzymes. We show that the best sequence variability measures take into account phylogenetic tree topology. More importantly, we show that both LPD measures (WCN and CN) correlate better than all of the SA measures, regardless of the sequence variability score used. Moreover, the independent contribution of the best LPD measure is approximately four times larger than that of the best SA measure. This study strongly supports the conclusion that a site's packing density rather than its solvent accessibility is the main structural determinant of its rate of evolution.

[1]  J. Echave,et al.  Generality of the structurally constrained protein evolution model: assessment on representatives of the four main fold classes. , 2005, Gene.

[2]  Claus O Wilke,et al.  Modeling coding-sequence evolution within the context of residue solvent accessibility , 2012, BMC Evolutionary Biology.

[3]  R. Warner Applied Statistics: From Bivariate through Multivariate Techniques [with CD-ROM]. , 2007 .

[4]  Austin G. Meyer,et al.  Maximum Allowed Solvent Accessibilites of Residues in Proteins , 2012, PloS one.

[5]  Tal Pupko,et al.  ConSurf 2010: calculating evolutionary conservation in sequence and structure of proteins and nucleic acids , 2010, Nucleic Acids Res..

[6]  Catherine L. Worth,et al.  Structural and functional constraints in the evolution of protein families , 2009, Nature Reviews Molecular Cell Biology.

[7]  Nir Ben-Tal,et al.  The ConSurf-DB: pre-calculated evolutionary conservation profiles of protein structures , 2008, Nucleic Acids Res..

[8]  N. Ben-Tal,et al.  Comparison of site-specific rate-inference methods for protein sequences: empirical Bayesian methods are superior. , 2004, Molecular biology and evolution.

[9]  Jimin Pei,et al.  AL2CO: calculation of positional conservation in a protein sequence alignment , 2001, Bioinform..

[10]  J. Echave,et al.  Structural constraints and emergence of sequence patterns in protein evolution. , 2001, Molecular biology and evolution.

[11]  Chih-Min Chang,et al.  Evolutionary information hidden in a single protein structure , 2012, Proteins.

[12]  G. Rose,et al.  Hydrophobicity of amino acid residues in globular proteins. , 1985, Science.

[13]  Jacob Cohen,et al.  Applied multiple regression/correlation analysis for the behavioral sciences , 1979 .

[14]  C. Sander,et al.  Database of homology‐derived protein structures and the structural meaning of sequence alignment , 1991, Proteins.

[15]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[16]  Jacob Cohen,et al.  Applied multiple regression/correlation analysis for the behavioral sciences , 1979 .

[17]  Craig A. Stewart,et al.  Introduction to computational biology , 2005 .

[18]  Fredrik Johansson,et al.  A comparative study of conservation and variation scores , 2010, BMC Bioinformatics.

[19]  Shao-Wei Huang,et al.  Deriving protein dynamical properties from weighted protein contact number , 2008, Proteins.

[20]  A M Lesk,et al.  Interior and surface of monomeric proteins. , 1987, Journal of molecular biology.

[21]  R L Jernigan,et al.  Protein sequence entropy is closely related to packing density and hydrophobicity. , 2005, Protein engineering, design & selection : PEDS.

[22]  Jenn-Kang Hwang,et al.  Site-specific structural constraints on protein sequence evolutionary divergence: local packing density versus solvent exposure. , 2014, Molecular biology and evolution.

[23]  C. Pál,et al.  An integrated view of protein evolution , 2006, Nature Reviews Genetics.

[24]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt) , 2004, Nucleic Acids Res..

[25]  Jeffrey L Thorne,et al.  Protein evolution constraints and model-based techniques to study them. , 2007, Current opinion in structural biology.

[26]  W. S. Valdar,et al.  Scoring residue conservation , 2002, Proteins.

[27]  Chih-Min Chang,et al.  On the relationship between the sequence conservation and the packing density profiles of the protein complexes , 2013, Proteins.

[28]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[29]  Claus O Wilke,et al.  The Relationship Between Relative Solvent Accessibility and Evolutionary Rate in Protein Evolution , 2011, Genetics.

[30]  J. Thornton,et al.  Protein–protein interfaces: Analysis of amino acid conservation in homodimers , 2001, Proteins.

[31]  Lucy J. Colwell,et al.  The interface of protein structure, protein biophysics, and molecular evolution , 2012, Protein science : a publication of the Protein Society.

[32]  O. Lichtarge,et al.  A family of evolution-entropy hybrid methods for ranking protein residues by importance. , 2004, Journal of molecular biology.

[33]  Jenn-Kang Hwang,et al.  A mechanistic stress model of protein evolution accounts for site-specific evolutionary rates and their relationship with packing density and flexibility , 2014, BMC Evolutionary Biology.

[34]  Yu Xia,et al.  Structural determinants of protein evolution are context-sensitive at the residue level. , 2009, Molecular biology and evolution.

[35]  Claus O Wilke,et al.  Signatures of protein biophysics in coding sequence evolution. , 2010, Current opinion in structural biology.

[36]  Claudia Neuhauser,et al.  The Pattern of Amino Acid Replacements in α/β-Barrels , 2002 .

[37]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[38]  B. Halle,et al.  Flexibility and packing in proteins , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[39]  María Silvina Fornasari,et al.  Site-specific amino acid replacement matrices from structurally constrained protein evolution simulations. , 2002, Molecular biology and evolution.

[40]  S. Karlin,et al.  Evolutionary conservation of RecA genes in relation to protein structure and function , 1996, Journal of bacteriology.

[41]  Itay Mayrose,et al.  Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues , 2002, ISMB.

[42]  D. Hartl,et al.  Solvent accessibility and purifying selection within proteins of Escherichia coli and Salmonella enterica. , 2000, Molecular biology and evolution.

[43]  E. Webb Enzyme nomenclature 1992. Recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology on the Nomenclature and Classification of Enzymes. , 1992 .

[44]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[45]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[46]  Johan A. Grahnen,et al.  Biophysical and structural considerations for protein sequence evolution , 2011, BMC Evolutionary Biology.