Discrete restraint‐based protein modeling and the Cα‐trace problem

We present a novel de novo method to generate protein models from sparse, discretized restraints on the conformation of the main chain and side chain atoms. We focus on Cα‐trace generation, the problem of constructing an accurate and complete model from approximate knowledge of the positions of the Cα atoms and, in some cases, the side chain centroids. Spatial restraints on the Cα atoms and side chain centroids are supplemented by constraints on main chain geometry, φ/ξ angles, rotameric side chain conformations, and inter‐atomic separations derived from analyses of known protein structures. A novel conformational search algorithm, combining features of tree‐search and genetic algorithms, generates models consistent with these restraints by propensity‐weighted dihedral angle sampling. Models with ideal geometry, good φ/ξ angles, and no inter‐atomic overlaps are produced with 0.8 Å main chain and, with side chain centroid restraints, 1.0 Å all‐atom root‐mean‐square deviation (RMSD) from the crystal structure over a diverse set of target proteins. The mean model derived from 50 independently generated models is closer to the crystal structure than any individual model, with 0.5 Å main chain RMSD under only Cα restraints and 0.7 Å all‐atom RMSD under both Cα and centroid restraints. The method is insensitive to randomly distributed errors of up to 4 Å in the Cα restraints. The conformational search algorithm is efficient, with computational cost increasing linearly with protein size. Issues relating to decoy set generation, experimental structure determination, efficiency of conformational sampling, and homology modeling are discussed.

[1]  A. Lesk,et al.  The relation between the divergence of sequence and structure in proteins. , 1986, The EMBO journal.

[2]  C. Sander,et al.  Database algorithm for generating protein backbone and side-chain co-ordinates from a C alpha trace application to model building and detection of co-ordinate errors. , 1991, Journal of molecular biology.

[3]  Mariusz Milik,et al.  Algorithm for rapid reconstruction of protein backbone from alpha carbon coordinates , 1997 .

[4]  Ian W. Davis,et al.  Structure validation by Cα geometry: ϕ,ψ and Cβ deviation , 2003, Proteins.

[5]  M. Levitt A simplified representation of protein conformations for rapid simulation of protein folding. , 1976, Journal of molecular biology.

[6]  J. Richardson,et al.  The penultimate rotamer library , 2000, Proteins.

[7]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[8]  Michael G. Rossmann,et al.  Three-dimensional coordinates from stereodiagrams of molecular structures , 1980 .

[9]  Atsushi Kasuya,et al.  An efficient method for reconstructing protein backbones from α-carbon coordinates , 2002 .

[10]  M. Levitt Accurate modeling of protein conformation by automatic segment matching. , 1992, Journal of molecular biology.

[11]  H A Scheraga,et al.  Conversion from a virtual‐bond chain to a complete polypeptide backbone chain , 1984, Biopolymers.

[12]  R. Huber,et al.  Accurate Bond and Angle Parameters for X-ray Protein Structure Refinement , 1991 .

[13]  Adam Liwo,et al.  Energy‐based reconstruction of a protein backbone from its α‐carbon trace by a Monte‐Carlo method , 2002, J. Comput. Chem..

[14]  R Samudrala,et al.  Decoys ‘R’ Us: A database of incorrect conformations to improve protein structure prediction , 2000, Protein science : a publication of the Protein Society.

[15]  T L Blundell,et al.  Comparison of solvent-inaccessible cores of homologous proteins: definitions useful for protein modelling. , 1987, Protein engineering.

[16]  M. DePristo,et al.  Ab initio construction of polypeptide fragments: Efficient generation of accurate, representative ensembles , 2003, Proteins.

[17]  M. Levitt,et al.  The complexity and accuracy of discrete state models of protein structure. , 1995, Journal of molecular biology.

[18]  Y Wang,et al.  A new procedure for constructing peptides into a given Calpha chain. , 1998, Folding & design.

[19]  Jeffrey Skolnick,et al.  Efficient algorithm for the reconstruction of a protein backbone from the α‐carbon coordinates , 1992 .

[20]  M. Zalis,et al.  Visualizing and quantifying molecular goodness-of-fit: small-probe contact dots with explicit hydrogen atoms. , 1999, Journal of molecular biology.

[21]  A. Mathiowetz,et al.  Building proteins from Cα coordinates using the dihedral probability grid Monte Carlo method , 1995, Protein science : a publication of the Protein Society.

[22]  S. Rackovsky,et al.  Calculation of protein backbone geometry from α‐carbon coordinates based on peptide‐group dipole alignment , 1993, Protein science : a publication of the Protein Society.

[23]  Nebojsa Mirkovic,et al.  ModBase, a database of annotated comparative protein structure models, and associated resources , 2010, Nucleic Acids Res..

[24]  J. Greer,et al.  Computer skeletonization and automatic electron density map analysis. , 1985, Methods in enzymology.

[25]  K. Fidelis,et al.  Comparison of systematic search and database methods for constructing segments of protein structure. , 1994, Protein engineering.

[26]  A. Sali,et al.  Protein Structure Prediction and Structural Genomics , 2001, Science.

[27]  J. Richardson,et al.  Asparagine and glutamine: using hydrogen atom contacts in the choice of side-chain amide orientation. , 1999, Journal of molecular biology.

[28]  Celia W G van Gelder,et al.  A molecular dynamics approach for the generation of complete protein structures from limited coordinate data , 1994, Proteins.

[29]  M. Levitt,et al.  Energy functions that discriminate X-ray and near native folds from well-constructed decoys. , 1996, Journal of molecular biology.

[30]  G. N. Ramachandran,et al.  Conformation of polypeptides and proteins. , 1968, Advances in protein chemistry.

[31]  T. A. Jones,et al.  Using known substructures in protein model building and crystallography. , 1986, The EMBO journal.

[32]  M. DePristo,et al.  Ab initio construction of polypeptide fragments: Accuracy of loop decoy discrimination by an all‐atom statistical potential and the AMBER force field with the Generalized Born solvation model , 2003, Proteins.

[33]  Z. Dauter,et al.  The benefits of atomic resolution. , 1997, Current opinion in structural biology.

[34]  Mariusz Milik,et al.  Algorithm for rapid reconstruction of protein backbone from alpha carbon coordinates , 1997, J. Comput. Chem..

[35]  A. Jabs,et al.  Non-proline cis peptide bonds in proteins. , 1999, Journal of molecular biology.

[36]  R. Bruccoleri,et al.  Application of a directed conformational search for generating 3‐D coordinates for protein structures from α‐carbon coordinates , 1992, Proteins.

[37]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[38]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[39]  Paul E. Correa,et al.  The building of protein structures form α‐carbon coordinates , 1990 .

[40]  P. Payne,et al.  Reconstruction of protein conformations from estimated positions of the Cα coordinates , 1993, Protein science : a publication of the Protein Society.

[41]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[42]  P. Kollman,et al.  A Second Generation Force Field for the Simulation of Proteins, Nucleic Acids, and Organic Molecules , 1995 .

[43]  P. Kollman,et al.  A Second Generation Force Field for the Simulation of Proteins, Nucleic Acids, and Organic Molecules J. Am. Chem. Soc. 1995, 117, 5179−5197 , 1996 .

[44]  D. Scott Linthicum,et al.  PROGEN: An automated modelling algorithm for the generation of complete protein structures from the α-carbon atomic coordinates , 1993, J. Comput. Aided Mol. Des..

[45]  Janet M. Thornton,et al.  Rebuilding flavodoxin from Cα coordinates: A test study , 1989 .

[46]  Roland L. Dunbrack,et al.  Prediction of protein side-chain rotamers from a backbone-dependent rotamer library: a new homology modeling tool. , 1997, Journal of molecular biology.

[47]  T. L. Blundell,et al.  Knowledge-based prediction of protein structures and the design of novel molecules , 1987, Nature.