Reducing the dimensionality of the protein‐folding search problem

How does a folding protein negotiate a vast, featureless conformational landscape and adopt its native structure in biological real time? Motivated by this search problem, we developed a novel algorithm to compare protein structures. Procedures to identify structural analogs are typically conducted in three‐dimensional space: the tertiary structure of a target protein is matched against each candidate in a database of structures, and goodness of fit is evaluated by a distance‐based measure, such as the root‐mean‐square distance between target and candidate. This is an expensive approach because three‐dimensional space is complex. Here, we transform the problem into a simpler one‐dimensional procedure. Specifically, we identify and label the 11 most populated residue basins in a database of high‐resolution protein structures. Using this 11‐letter alphabet, any protein's three‐dimensional structure can be transformed into a one‐dimensional string by mapping each residue onto its corresponding basin. Similarity between the resultant basin strings can then be evaluated by conventional sequence‐based comparison. The disorder → order folding transition is abridged on both sides. At the onset, folding conditions necessitate formation of hydrogen‐bonded scaffold elements on which proteins are assembled, severely restricting the magnitude of accessible conformational space. Near the end, chain topology is established prior to emergence of the close‐packed native state. At this latter stage of folding, the chain remains molten, and residues populate natural basins that are approximated by the 11 basins derived here. In essence, our algorithm reduces the protein‐folding search problem to mapping the amino acid sequence onto a restricted basin string.

[1]  Nicholas C Fitzkee,et al.  The Protein Coil Library: A structural database of nonhelix, nonstrand fragments derived from the PDB , 2005, Proteins.

[2]  G. Rose,et al.  Structure and energetics of the hydrogen-bonded backbone in protein folding. , 2008, Annual review of biochemistry.

[3]  Akinori Sarai,et al.  ProTherm and ProNIT: thermodynamic databases for proteins and protein–nucleic acid interactions , 2005, Nucleic Acids Res..

[4]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[5]  J. Ferreon,et al.  The effect of the polyproline II (PPII) conformation on the denatured state entropy , 2003, Protein science : a publication of the Protein Society.

[6]  Lauren L. Perskie,et al.  Physical–chemical determinants of coil conformations in globular proteins , 2010, Protein science : a publication of the Protein Society.

[7]  A. Ginsburg,et al.  Some Specific Ion Effects on the Conformation and Thermal Stability of Ribonuclease , 1965 .

[8]  Carl Frieden,et al.  Stopped-flow NMR spectroscopy: real-time unfolding studies of 6-19F-tryptophan-labeled Escherichia coli dihydrofolate reductase. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[10]  H. Scheraga,et al.  Calculation of protein conformation by the build-up procedure. Application to bovine pancreatic trypsin inhibitor using limited simulated nuclear magnetic resonance data. , 1988, Journal of biomolecular structure & dynamics.

[11]  P. Kollman,et al.  A Second Generation Force Field for the Simulation of Proteins, Nucleic Acids, and Organic Molecules , 1995 .

[12]  Robert L. Baldwin,et al.  Direct NMR evidence for an intermediate preceding the rate-limiting step in the unfolding of ribonuclease A , 1995, Nature.

[13]  G. Rose,et al.  Hydrogen‐bonded turns in proteins: The case for a recount , 2005, Protein science : a publication of the Protein Society.

[14]  M. Levitt,et al.  Small libraries of protein fragments model native protein structures accurately. , 2002, Journal of molecular biology.

[15]  Flavio Seno,et al.  Geometry and symmetry presculpt the free-energy landscape of proteins. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[16]  S. Hovmöller,et al.  Describing and comparing protein structures using shape strings. , 2008, Current protein and peptide science.

[17]  R. L. Baldwin,et al.  Populations of the three major backbone conformations in 19 amino acid dipeptides , 2011, Proceedings of the National Academy of Sciences.

[18]  Alexander D. MacKerell,et al.  All-atom empirical potential for molecular modeling and dynamics studies of proteins. , 1998, The journal of physical chemistry. B.

[19]  K. Dill Polymer principles and protein folding , 1999, Protein science : a publication of the Protein Society.

[20]  Robin S. Dothager,et al.  Random-coil behavior and the dimensions of chemically unfolded proteins. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[21]  G. Rose,et al.  A backbone-based theory of protein folding , 2006, Proceedings of the National Academy of Sciences.

[22]  M. Record,et al.  Urea-amide preferential interactions in water: quantitative comparison of model compound data with biopolymer results using water accessible surface areas. , 2007, The journal of physical chemistry. B.

[23]  J. Markley,et al.  Evidence for a molten globule-like transition state in protein folding from determination of activation volumes. , 1995, Biochemistry.

[24]  B. Pettitt,et al.  Protein folding, stability, and solvation structure in osmolyte solutions. , 2005, Biophysical journal.

[25]  J. Udgaonkar,et al.  Direct evidence for a dry molten globule intermediate during the unfolding of a small protein , 2009, Proceedings of the National Academy of Sciences.

[26]  R. Doolittle,et al.  Of urfs and orfs , 1986 .

[27]  George D Rose,et al.  Polyproline II structure in a sequence of seven alanine residues , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[28]  C. Etchebest,et al.  A structural alphabet for local protein structures: Improved prediction methods , 2005, Proteins.

[29]  C. Levinthal How to fold graciously , 1969 .

[30]  Ruben Recabarren,et al.  Estimating the total number of protein folds , 1999, Proteins.

[31]  D. W. Bolen,et al.  Anatomy of energetic changes accompanying urea-induced protein denaturation , 2007, Proceedings of the National Academy of Sciences.

[32]  J. Onuchic,et al.  Funnels, pathways, and the energy landscape of protein folding: A synthesis , 1994, Proteins.

[33]  G. N. Ramachandran,et al.  Conformation of polypeptides and proteins. , 1968, Advances in protein chemistry.

[34]  Hannes Neuweiler,et al.  Backbone-driven collapse in unfolded protein chains. , 2011, Journal of molecular biology.

[35]  R. L. Baldwin,et al.  Role of backbone solvation and electrostatics in generating preferred peptide backbone conformations: Distributions of phi , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[36]  N. Go Theoretical studies of protein folding. , 1983, Annual review of biophysics and bioengineering.

[37]  Adam Godzik,et al.  Using an alignment of fragment strings for comparing protein structures , 2007, Bioinform..

[38]  P. Kollman,et al.  A Second Generation Force Field for the Simulation of Proteins, Nucleic Acids, and Organic Molecules J. Am. Chem. Soc. 1995, 117, 5179−5197 , 1996 .

[39]  Gianluca Pollastri,et al.  Structural alphabets for protein structure classification: a comparison study. , 2009, Journal of molecular biology.

[40]  R. Nussinov,et al.  Anatomy of protein structures: visualizing how a one-dimensional protein chain folds into a three-dimensional shape. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[41]  G. Rose,et al.  Secondary structure determines protein topology , 2006, Protein science : a publication of the Protein Society.

[42]  G. Rose,et al.  Do all backbone polar groups in proteins form hydrogen bonds? , 2005, Protein science : a publication of the Protein Society.

[43]  G. Rose,et al.  Reassessing random-coil statistics in unfolded proteins. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[44]  C. Tanford Protein denaturation. , 1968, Advances in protein chemistry.

[45]  G. Rose,et al.  Building native protein conformation from highly approximate backbone torsion angles. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[46]  C. Chothia Proteins. One thousand families for the molecular biologist. , 1992, Nature.

[47]  R. Srinivasan,et al.  The Flory isolated-pair hypothesis is not valid for polypeptide chains: implications for protein folding. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[48]  E J Milner-White,et al.  Pyrrolidine ring puckering in cis and trans-proline residues in proteins and polypeptides. Different puckers are favoured in certain situations. , 1992, Journal of molecular biology.

[49]  D. W. Bolen,et al.  Predicting the energetics of osmolyte-induced protein folding/unfolding. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[50]  Bartek Wilczynski,et al.  Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..

[51]  G. Rose,et al.  A simple model for polyproline II structure in unfolded states of alanine‐based peptides , 2002, Protein science : a publication of the Protein Society.

[52]  Lauren L. Perskie,et al.  Physical‐chemical determinants of turn conformations in globular proteins , 2007, Protein science : a publication of the Protein Society.

[53]  P. Henklein,et al.  An unlocking/relocking barrier in conformational fluctuations of villin headpiece subdomain , 2010, Proceedings of the National Academy of Sciences.

[54]  Michael Levitt,et al.  Probing protein fold space with a simplified model. , 2008, Journal of molecular biology.

[55]  B Montgomery Pettitt,et al.  Trimethylamine N‐oxide influence on the backbone of proteins: An oligoglycine model , 2009, Proteins.

[56]  Ruhong Zhou,et al.  Urea denaturation by stronger dispersion interactions with proteins than water implies a 2-stage unfolding , 2008, Proceedings of the National Academy of Sciences.

[57]  R. E. Wheeler Statistical distributions , 1983, APLQ.

[58]  J L Sussman,et al.  A 3D building blocks approach to analyzing and predicting structure of proteins , 1989, Proteins.

[59]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[60]  R. Aurora,et al.  Helix capping , 1998, Protein science : a publication of the Protein Society.

[61]  J. Skolnick,et al.  On the origin and highly likely completeness of single-domain protein structures. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[62]  Bernard R Brooks,et al.  Effects of denaturants and osmolytes on proteins are accurately predicted by the molecular transfer model , 2008, Proceedings of the National Academy of Sciences.

[63]  M. Evans Statistical Distributions , 2000 .

[64]  Inbal Budowski-Tal,et al.  FragBag, an accurate representation of protein structure, retrieves structural neighbors from the entire PDB quickly and accurately , 2010, Proceedings of the National Academy of Sciences.

[65]  Andrew S. LaCroix,et al.  Separation of preferential interaction and excluded volume effects on DNA duplex and hairpin stability , 2011, Proceedings of the National Academy of Sciences.

[66]  C. Chothia One thousand families for the molecular biologist , 1992, Nature.

[67]  S. Krimm,et al.  Circular dichroism of poly-L-proline in an unordered conformation. , 1968, Biopolymers.

[68]  K. Dill,et al.  From Levinthal to pathways to funnels , 1997, Nature Structural Biology.

[69]  S Walter Englander,et al.  Protein folding: the stepwise assembly of foldon units. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[70]  G. Rose,et al.  Dry molten globule intermediates and the mechanism of protein unfolding , 2010, Proteins.

[71]  Eckard Münck,et al.  Mössbauer Spectroscopy of Biological Systems , 2012 .

[72]  W. Kauzmann Some factors in the interpretation of protein denaturation. , 1959, Advances in protein chemistry.

[73]  Mihaly Mezei,et al.  Polyproline II helix is the preferred conformation for unfolded polyalanine in water , 2004, Proteins.

[74]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[75]  George D. Rose,et al.  A protein taxonomy based on secondary structure , 1999, Nature Structural Biology.

[76]  G. Rose,et al.  A molecular mechanism for osmolyte-induced protein stability , 2006, Proceedings of the National Academy of Sciences.

[77]  George D Rose,et al.  Structures, basins, and energies: A deconstruction of the Protein Coil Library , 2008, Protein science : a publication of the Protein Society.

[78]  G. Rose,et al.  Sterics and solvation winnow accessible conformational space for unfolded proteins. , 2005, Journal of molecular biology.

[79]  E I Shakhnovich,et al.  Theory of cooperative transitions in protein molecules. I. Why denaturation of globular protein is a first‐order phase transition , 1989, Biopolymers.