Reconstruction of Protein Backbones from the BriX Collection of Canonical Protein Fragments

As modeling of changes in backbone conformation still lacks a computationally efficient solution, we developed a discretisation of the conformational states accessible to the protein backbone similar to the successful rotamer approach in side chains. The BriX fragment database, consisting of fragments from 4 to 14 residues long, was realized through identification of recurrent backbone fragments from a non-redundant set of high-resolution protein structures. BriX contains an alphabet of more than 1,000 frequently observed conformations per peptide length for 6 different variation levels. Analysis of the performance of BriX revealed an average structural coverage of protein structures of more than 99% within a root mean square distance (RMSD) of 1 Angstrom. Globally, we are able to reconstruct protein structures with an average accuracy of 0.48 Angstrom RMSD. As expected, regular structures are well covered, but, interestingly, many loop regions that appear irregular at first glance are also found to form a recurrent structural motif, albeit with lower frequency of occurrence than regular secondary structures. Larger loop regions could be completely reconstructed from smaller recurrent elements, between 4 and 8 residues long. Finally, we observed that a significant amount of short sequences tend to display strong structural ambiguity between alpha helix and extended conformations. When the sequence length increases, this so-called sequence plasticity is no longer observed, illustrating the context dependency of polypeptide structures.

[1]  Cristina Benros,et al.  Assessing a novel approach for predicting local 3D protein structures from sequence , 2005, Proteins.

[2]  Philip E. Bourne,et al.  The RCSB PDB information portal for structural genomics , 2005, Nucleic Acids Res..

[3]  M J Sippl,et al.  Assembly of polypeptide and protein backbone conformations from low energy ensembles of short fragments: Development of strategies and construction of models for myoglobin, lysozyme, and thymosin β4 , 1992, Protein science : a publication of the Protein Society.

[4]  Rafael Brüschweiler,et al.  Efficient RMSD measures for the comparison of two molecular ensembles , 2002, Proteins.

[5]  W. Delano The PyMOL Molecular Graphics System , 2002 .

[6]  Christodoulos A. Floudas,et al.  Advances in protein structure prediction and de novo protein design : A review , 2006 .

[7]  M. Levitt,et al.  The complexity and accuracy of discrete state models of protein structure. , 1995, Journal of molecular biology.

[8]  K. Ginalski Comparative modeling for protein structure prediction. , 2006, Current opinion in structural biology.

[9]  David E. Kim,et al.  Free modeling with Rosetta in CASP6 , 2005, Proteins.

[10]  L. Pauling,et al.  The pleated sheet, a new layer configuration of polypeptide chains. , 1951, Proceedings of the National Academy of Sciences of the United States of America.

[11]  K. Fidelis,et al.  Comparison of systematic search and database methods for constructing segments of protein structure. , 1994, Protein engineering.

[12]  Chris Sander,et al.  Dali/FSSP classification of three-dimensional protein folds , 1997, Nucleic Acids Res..

[13]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[14]  R. Nussinov,et al.  Anatomy of protein structures: visualizing how a one-dimensional protein chain folds into a three-dimensional shape. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[15]  G Vriend,et al.  WHAT IF: a molecular modeling and drug design program. , 1990, Journal of molecular graphics.

[16]  Stephen A. Cammer SChiSM2: creating interactive web page annotations of molecular structure models using Jmol , 2007, Bioinform..

[17]  D T Jones,et al.  Prediction of novel and analogous folds using fragment assembly and fold recognition , 2005, Proteins.

[18]  A C Camproux,et al.  Hidden Markov model-derived structural alphabet for proteins: the learning of protein local shapes captures sequence specificity. , 2005, Biochimica et biophysica acta.

[19]  Geoffrey J. Barton,et al.  The Jalview Java alignment editor , 2004, Bioinform..

[20]  M. DePristo,et al.  Ab initio construction of polypeptide fragments: Efficient generation of accurate, representative ensembles , 2003, Proteins.

[21]  Richard Bonneau,et al.  Ab initio protein structure prediction of CASP III targets using ROSETTA , 1999, Proteins.

[22]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[23]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[24]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[25]  Jerry Tsai,et al.  Some fundamental aspects of building protein structures from fragment libraries , 2004, Protein science : a publication of the Protein Society.

[26]  Julian Lee,et al.  Protein structure prediction based on fragment assembly and parameter optimization. , 2005, Biophysical chemistry.

[27]  Arne Elofsson,et al.  All are not equal: A benchmark of different homology modeling programs , 2005, Protein science : a publication of the Protein Society.

[28]  C. Sander,et al.  A database of protein structure families with common folding motifs , 1992, Protein science : a publication of the Protein Society.

[29]  D. Baker,et al.  Multipass membrane protein structure prediction using Rosetta , 2005, Proteins.

[30]  Ruth Nussinov,et al.  Protein structure prediction via combinatorial assembly of sub-structural units , 2003, ISMB.

[31]  C. Etchebest,et al.  A structural alphabet for local protein structures: Improved prediction methods , 2005, Proteins.

[32]  Lars Malmström,et al.  Prediction of CASP6 structures using automated robetta protocols , 2005, Proteins.

[33]  Shankar Subramaniam,et al.  Protein fragment clustering and canonical local shapes , 2003, Proteins.

[34]  A. Sali,et al.  Comparative protein structure modeling of genes and genomes. , 2000, Annual review of biophysics and biomolecular structure.

[35]  Patrice Koehl,et al.  The ASTRAL Compendium in 2004 , 2003, Nucleic Acids Res..

[36]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[37]  D. Eisenberg,et al.  An evolutionary approach to folding small alpha-helical proteins that uses sequence information and an empirical guiding fitness function. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[38]  Ruth Nussinov,et al.  A permissive secondary structure-guided superposition tool for clustering of protein fragments toward protein structure prediction via fragment assembly , 2006, Bioinform..

[39]  Ronald M Levy,et al.  Have we seen all structures corresponding to short protein fragments in the Protein Data Bank? An update. , 2003, Protein engineering.

[40]  P. Derreumaux,et al.  A coarse‐grained protein force field for folding and structure prediction , 2007, Proteins.

[41]  W. Kabsch A solution for the best rotation to relate two sets of vectors , 1976 .

[42]  J J Wendoloski,et al.  PROBIT: a statistical approach to modeling proteins from partial coordinate data using substructure libraries. , 1992, Journal of molecular graphics.

[43]  L. Pauling,et al.  The structure of proteins; two hydrogen-bonded helical configurations of the polypeptide chain. , 1951, Proceedings of the National Academy of Sciences of the United States of America.

[44]  Sandeep Kumar,et al.  A hierarchical, building-block-based computational scheme for protein structure prediction , 2001, IBM J. Res. Dev..

[45]  Gert Vriend,et al.  Increasing the precision of comparative models with YASARA NOVA—a self‐parameterizing force field , 2002, Proteins.

[46]  M. Levitt,et al.  Small libraries of protein fragments model native protein structures accurately. , 2002, Journal of molecular biology.

[47]  Ruth Nussinov,et al.  Hierarchical protein folding pathways: A computational study of protein fragments , 2003, Proteins.

[48]  D. Baker,et al.  Prediction of local structure in proteins using a library of sequence-structure motifs. , 1998, Journal of molecular biology.

[49]  M. Levitt,et al.  Protein decoy assembly using short fragments under geometric constraints , 2003, Biopolymers.

[50]  Ruth Nussinov,et al.  fragment folding and assembly Reducing the computational complexity of protein folding via , 2002 .

[51]  C. Etchebest,et al.  Bayesian probabilistic approach for predicting backbone structures in terms of protein blocks , 2000, Proteins.

[52]  Jinfeng Liu,et al.  Novel leverage of structural genomics , 2007, Nature Biotechnology.

[53]  D. Baker,et al.  Design of a Novel Globular Protein Fold with Atomic-Level Accuracy , 2003, Science.