Mining the Protein Data Bank with CReF to predict approximate 3-D structures of polypeptides

n this paper we describe CReF, a Central Residue Fragment-based method to predict approximate 3-D structures of polypeptides by mining the Protein Data Bank (PDB). The approximate predicted structures are good enough to be used as starting conformations in refinement procedures employing state-of-the-art molecular mechanics methods such as molecular dynamics simulations. CReF is very fast and we illustrate its efficacy in three case studies of polypeptides whose sizes vary from 34 to 70 amino acids. As indicated by the RMSD values, our initial results show that the predicted structures adopt the expected fold, similar to the experimental ones.

[1]  S. Hovmöller,et al.  Conformations of amino acids in proteins. , 2002, Acta crystallographica. Section D, Biological crystallography.

[2]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[3]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[4]  R. Srinivasan,et al.  Ab initio prediction of protein structure using LINUS , 2002, Proteins.

[5]  C. Levinthal Are there pathways for protein folding , 1968 .

[6]  C Combet,et al.  NPS@: network protein sequence analysis. , 2000, Trends in biochemical sciences.

[7]  B. Rost,et al.  Critical assessment of methods of protein structure prediction (CASP)—Round 6 , 2005 .

[8]  D. Osguthorpe Ab initio protein folding. , 2000, Current opinion in structural biology.

[9]  J. Thornton,et al.  PROCHECK: a program to check the stereochemical quality of protein structures , 1993 .

[10]  O. N. de Souza,et al.  Ab initio 3-D structure prediction of an artificially designed three-alpha-helix bundle via all-atom molecular dynamics simulations. , 2007, Genetics and molecular research : GMR.

[11]  J. Bujnicki Protein-Structure Prediction by Recombination of Fragments , 2006 .

[12]  W. Delano The PyMOL Molecular Graphics System , 2002 .

[13]  Jeffrey Chang,et al.  Biopython: Python tools for computational biology , 2000, SIGB.

[14]  Holger Gohlke,et al.  The Amber biomolecular simulation programs , 2005, J. Comput. Chem..

[15]  T. Creighton,et al.  Protein Folding , 1992 .

[16]  Ian Witten,et al.  Data Mining , 2000 .

[17]  Anna Tramontano Protein Structure Prediction: Concepts and Applications , 2006 .

[18]  J. Thornton,et al.  Stereochemical quality of protein structure coordinates , 1992, Proteins.

[19]  David Baker,et al.  Protein Structure Prediction Using Rosetta , 2004, Numerical Computer Methods, Part D.

[20]  R. Srinivasan,et al.  LINUS: A hierarchic procedure to predict the fold of a protein , 1995, Proteins.

[21]  R. Tibshirani,et al.  An introduction to the bootstrap , 1993 .

[22]  Andras Fiser,et al.  Comparative protein structure modeling of genes and genomes. , 2000, Annual review of biophysics and biomolecular structure.

[23]  P. Argos,et al.  Incorporation of non-local interactions in protein secondary structure prediction from the amino acid sequence. , 1996, Protein engineering.

[24]  Richard Bonneau,et al.  Ab initio protein structure prediction of CASP III targets using ROSETTA , 1999, Proteins.

[25]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[26]  Andreas D. Baxevanis,et al.  Bioinformatics - a practical guide to the analysis of genes and proteins , 2001, Methods of biochemical analysis.

[27]  R. King,et al.  Identification and application of the concepts important for accurate and reliable protein secondary structure prediction , 1996, Protein science : a publication of the Protein Society.

[28]  C. Branden,et al.  Introduction to protein structure , 1991 .

[29]  A. Kolinski Protein modeling and structure prediction with a reduced representation. , 2004, Acta biochimica Polonica.

[30]  M. Starovasnik,et al.  Structural mimicry of a native protein by a minimized binding domain. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[32]  H. Berendsen,et al.  COMPUTER-SIMULATION OF MOLECULAR-DYNAMICS - METHODOLOGY, APPLICATIONS, AND PERSPECTIVES IN CHEMISTRY , 1990 .

[33]  John Moult,et al.  A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction. , 2005, Current opinion in structural biology.

[34]  Joe Marks,et al.  Computational Complexity, Protein Structure Prediction, and the Levinthal Paradox , 1994 .

[35]  C. Anfinsen,et al.  The kinetics of formation of native ribonuclease during oxidation of the reduced polypeptide chain. , 1961, Proceedings of the National Academy of Sciences of the United States of America.

[36]  D. T. Jones,et al.  A new approach to protein fold recognition , 1992, Nature.

[37]  Jill P. Mesirov,et al.  Protein structure prediction by a data-level parallel algorithm , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[38]  I Morize,et al.  Refinement of the C222(1) crystal form of oxidized uteroglobin at 1.34 A resolution. , 1987, Journal of molecular biology.

[39]  A. Tramontano,et al.  Critical assessment of methods of protein structure prediction (CASP)—round IX , 2011, Proteins.

[40]  G. N. Ramachandran,et al.  Conformation of polypeptides and proteins. , 1968, Advances in protein chemistry.

[41]  M Kokkinidis,et al.  Structure of the ColE1 rop protein at 1.7 A resolution. , 1987, Journal of molecular biology.