Fast protein folding in the hydrophobic-hydrophilic model within three-eights of optimal

We present performance-guaranteed approximation algorithms for the protein folding problem in the hydrophobic-hydrophilic model (Dill, 1985). Our algorithms are the first approximation algorithms in the literature with guaranteed performance for this model (Dill, 1994). The hydrophobic-hydrophilic model abstracts the dominant force of protein folding: the hydrophobic interaction. The protein is modeled as a chain of amino acids of length n that are of two types; H (hydrophobic, i.e., nonpolar) and P (hydrophilic, i.e., polar). Although this model is a simplification of more complex protein folding models, the protein folding structure prediction problem is notoriously difficult for this model. Our algorithms have linear (3n) or quadratic time and achieve a three-dimensional protein conformation that has a guaranteed free energy no worse than three-eighths of optimal. This result answers the open problem of Ngo et al. (1994) about the possible existence of an efficient approximation algorithm with guaranteed performance for protein structure prediction in any well-studied model of protein folding. By achieving speed and near-optimality simultaneously, our algorithms rigorously capture salient features of the recently proposed framework of protein folding by Sali et al. (1994). Equally important, the final conformations of our algorithms have significant secondary structure (antiparallel sheets, beta-sheets, compact hydrophobic core). Furthermore, hypothetical folding pathways can be described for our algorithms that fit within the framework of diffusion-collision protein folding proposed by Karplus and Weaver (1979). Computational limitations of algorithms that compute the optimal conformation have restricted their applicability to short sequences (length < or = 90). Because our algorithms trade computational accuracy for speed, they can construct near-optimal conformations in linear time for sequences of any size.

[1]  K. Dill,et al.  Cooperativity in protein-folding kinetics. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[2]  A J Olson,et al.  Soluble proteins: size, shape and function. , 1993, Trends in biochemical sciences.

[3]  Eugene I. Shakhnovich,et al.  Ground state of random copolymers and the discrete random energy model , 1993 .

[4]  B. Lee,et al.  The interpretation of protein structures: estimation of static accessibility. , 1971, Journal of molecular biology.

[5]  T. Creighton,et al.  Circular and circularly permuted forms of bovine pancreatic trypsin inhibitor. , 1983, Journal of molecular biology.

[6]  F. Young Biochemistry , 1955, The Indian Medical Gazette.

[7]  K A Dill,et al.  Side‐chain entropy and packing in proteins , 1994, Protein science : a publication of the Protein Society.

[8]  K. Dill Theory for the folding and stability of globular proteins. , 1985, Biochemistry.

[9]  E. Shakhnovich,et al.  Engineering of stable and fast-folding sequences of model proteins. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[10]  M. Karplus,et al.  How does a protein fold? , 1994, Nature.

[11]  Joe Marks,et al.  Computational Complexity, Protein Structure Prediction, and the Levinthal Paradox , 1994 .

[12]  K. Dill,et al.  Modeling compact denatured states of proteins. , 1994, Biochemistry.

[13]  K. Dill,et al.  Theory for protein mutability and biogenesis. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[14]  R. Unger,et al.  Finding the lowest free energy conformation of a protein is an NP-hard problem: proof and implications. , 1993, Bulletin of mathematical biology.

[15]  R Unger,et al.  Genetic algorithms for protein folding simulations. , 1992, Journal of molecular biology.

[16]  E I Shakhnovich,et al.  A test of lattice protein folding algorithms. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[17]  K. Dill,et al.  Origins of structure in globular proteins. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[18]  K Yue,et al.  Forces of tertiary structural organization in globular proteins. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[19]  J T Ngo,et al.  Computational complexity of a problem in molecular structure prediction. , 1992, Protein engineering.

[20]  Carsten Lund,et al.  Proof verification and the intractability of approximation problems , 1992, FOCS 1992.

[21]  Yue,et al.  Sequence-structure relationships in proteins and copolymers. , 1993, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[22]  K. Dill,et al.  Polymer principles in protein structure and stability. , 1991, Annual review of biophysics and biophysical chemistry.

[23]  D. Yee,et al.  Principles of protein folding — A perspective from simple exact models , 1995, Protein science : a publication of the Protein Society.

[24]  William E. Hart,et al.  Invariant Patterns in Crystal Lattices: Implications for Protein Folding Algorithms (Extended Abstract) , 1996, CPM.

[25]  K. Dill,et al.  A lattice statistical mechanics model of the conformational and sequence spaces of proteins , 1989 .

[26]  Paul E. Stolorz,et al.  Recursive approaches to the statistical physics of lattice proteins , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[27]  M. Karplus,et al.  Diffusion–collision model for protein folding , 1979 .

[28]  Carsten Lund,et al.  Proof verification and hardness of approximation problems , 1992, Proceedings., 33rd Annual Symposium on Foundations of Computer Science.

[29]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[30]  J. Szulmajster Protein folding , 1988, Bioscience reports.

[31]  D. Lipman,et al.  Modelling neutral and selective evolution of protein folding , 1991, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[32]  Ron Unger,et al.  Genetic Algorithm for 3D Protein Folding Simulations , 1993, ICGA.

[33]  J. Davenport Editor , 1960 .

[34]  H. Scheraga,et al.  Experimental and theoretical aspects of protein folding. , 1975, Advances in protein chemistry.