Ab initio fold prediction of small helical proteins using distance geometry and knowledge-based scoring functions.

The problem of protein tertiary structure prediction from primary sequence can be separated into two subproblems: generation of a library of possible folds and specification of a best fold given the library. A distance geometry procedure based on random pairwise metrization with good sampling properties was used to generate a library of 500 possible structures for each of 11 small helical proteins. The input to distance geometry consisted of sets of restraints to enforce predicted helical secondary structure and a generic range of 5 to 11 A between predicted contact residues on all pairs of helices. For each of the 11 targets, the resulting library contained structures with low RMSD versus the native structure. Near-native sampling was enhanced by at least three orders of magnitude compared to a random sampling of compact folds. All library members were scored with a combination of an all-atom distance-dependent function, a residue pair-potential, and a hydrophobicity function. In six of the 11 cases, the best-ranking fold was considered to be near native. Each library was also reduced to a final ab initio prediction via consensus distance geometry performed over the 50 best-ranking structures from the full set of 500. The consensus results were of generally higher quality, yielding six predictions within 6.5 A of the native fold. These favorable predictions corresponded to those for which the correlation between the RMSD and the scoring function were highest. The advantage of the reported methodology is its extreme simplicity and potential for including other types of structural restraints.

[1]  F E Cohen,et al.  Protein folding: evaluation of some simple rules for the assembly of helices into tertiary structures with myoglobin as an example. , 1979, Journal of molecular biology.

[2]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[3]  Conrad C. Huang,et al.  The MIDAS display system , 1988 .

[4]  S. Doniach,et al.  A computer model to dynamically simulate protein folding: Studies with crambin , 1989, Proteins.

[5]  T. Creighton,et al.  Protein Folding , 1992 .

[6]  C. Sander,et al.  Database of homology‐derived protein structures and the structural meaning of sequence alignment , 1991, Proteins.

[7]  Timothy F. Havel An evaluation of computational strategies for use in the determination of protein structure from distance constraints obtained by nuclear magnetic resonance. , 1991, Progress in biophysics and molecular biology.

[8]  D G Covell Folding protein alpha-carbon chains into compact forms by Monte Carlo methods. , 1992, Proteins.

[9]  D. Covell Folding protein α‐carbon chains into compact forms by monte carlo methods , 1992 .

[10]  M. Levitt,et al.  A lattice model for protein structure prediction at low resolution. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[11]  M. Levitt Accurate modeling of protein conformation by automatic segment matching. , 1992, Journal of molecular biology.

[12]  R. Levy,et al.  Global folding of proteins using a limited number of distance constraints. , 1993, Protein engineering.

[13]  M. Levitt,et al.  Exploring conformational space with a simple lattice model for protein structure. , 1994, Journal of molecular biology.

[14]  Burkhard Rost,et al.  PHD - an automatic mail server for protein secondary structure prediction , 1994, Comput. Appl. Biosci..

[15]  D. Eisenberg,et al.  An evolutionary approach to folding small alpha-helical proteins that uses sequence information and an empirical guiding fitness function. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[16]  W. Taylor,et al.  Global fold determination from a small number of distance restraints. , 1995, Journal of molecular biology.

[17]  W. Braun,et al.  Predicting the helix packing of globular proteins by self‐correcting distance geometry , 1995, Protein science : a publication of the Protein Society.

[18]  R. Srinivasan,et al.  LINUS: A hierarchic procedure to predict the fold of a protein , 1995, Proteins.

[19]  M. Levitt,et al.  The complexity and accuracy of discrete state models of protein structure. , 1995, Journal of molecular biology.

[20]  M Levitt,et al.  Recognizing native folds by the arrangement of hydrophobic and polar residues. , 1995, Journal of molecular biology.

[21]  R. Friesner,et al.  Computer modeling of protein folding: conformational and energetic analysis of reduced and detailed protein models. , 1995, Journal of molecular biology.

[22]  K. Dill,et al.  A simple protein folding algorithm using a binary code and secondary structure constraints. , 1995, Protein engineering.

[23]  M. Levitt,et al.  Potential energy function and parameters for simulations of the molecular dynamics of proteins and nucleic acids in solution , 1995 .

[24]  P Argos,et al.  Identifying the tertiary fold of small proteins with different topologies from sequence and secondary structure using the genetic algorithm and extended criteria specific for strand regions. , 1996, Journal of molecular biology.

[25]  M. Levitt,et al.  Energy functions that discriminate X-ray and near native folds from well-constructed decoys. , 1996, Journal of molecular biology.

[26]  K. Dill,et al.  Statistical potentials extracted from protein structures: how accurate are they? , 1996, Journal of molecular biology.

[27]  J. Ponder,et al.  The NMR solution structure of intestinal fatty acid-binding protein complexed with palmitate: application of a novel distance geometry algorithm. , 1996, Journal of molecular biology.

[28]  S Brunak,et al.  Relationship between protein structure and geometrical constraints , 1996, Protein science : a publication of the Protein Society.

[29]  C Kooperberg,et al.  Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. , 1997, Journal of molecular biology.

[30]  E S Huang,et al.  Factors affecting the ability of energy functions to discriminate correct from incorrect folds. , 1997, Journal of molecular biology.

[31]  David T. Jones Successful ab initio prediction of the tertiary structure of NK‐lysin using multiple sequences and recognized supersecondary structural motifs , 1997, Proteins.

[32]  J Moult,et al.  Protein folding simulations with genetic algorithms and a detailed molecular description. , 1997, Journal of molecular biology.

[33]  R. Samudrala,et al.  An all-atom distance-dependent conditional probability discriminatory function for protein structure prediction. , 1998, Journal of molecular biology.

[34]  G Chelvanayagam,et al.  A combinatorial distance-constraint approach to predicting protein tertiary models from known secondary structure. , 1998, Folding & design.

[35]  Y. Cui,et al.  Protein folding simulation with genetic algorithm and supersecondary structure constraints , 1998, Proteins.

[36]  J. Skolnick,et al.  What is the probability of a chance prediction of a protein structure with an rmsd of 6 A? , 1998, Folding & design.

[37]  Ram Samudrala,et al.  Distance geometry generates native‐like folds for small helical proteins using the consensus distances of predicted protein structures , 1998, Protein science : a publication of the Protein Society.

[38]  J. Skolnick,et al.  Fold assembly of small proteins using monte carlo simulations driven by restraints derived from multiple sequence alignments. , 1998, Journal of molecular biology.

[39]  Ram Samudrala,et al.  A Combined Approach for Ab Initio Construction of Low Resolution Protein Tertiary Structures from Sequence , 1999, Pacific Symposium on Biocomputing.

[40]  W. Braun,et al.  Sequence specificity, statistical potentials, and three‐dimensional structure prediction with self‐correcting distance geometry calculations of β‐sheet formation in proteins , 2008 .

[41]  Cheng Che Chen,et al.  Using imperfect secondary structure predictions to improve molecular structure computations , 1999, Bioinform..

[42]  Tim J. P. Hubbard,et al.  SCOP: a Structural Classification of Proteins database , 2000, Nucleic Acids Res..