The protein structure prediction problem could be solved using the current PDB library.

For single-domain proteins, we examine the completeness of the structures in the current Protein Data Bank (PDB) library for use in full-length model construction of unknown sequences. To address this issue, we employ a comprehensive benchmark set of 1,489 medium-size proteins that cover the PDB at the level of 35% sequence identity and identify templates by structure alignment. With homologous proteins excluded, we can always find similar folds to native with an average rms deviation (RMSD) from native of 2.5 A with approximately 82% alignment coverage. These template structures often contain a significant number of insertions/deletions. The tasser algorithm was applied to build full-length models, where continuous fragments are excised from the top-scoring templates and reassembled under the guide of an optimized force field, which includes consensus restraints taken from the templates and knowledge-based statistical potentials. For almost all targets (except for 2/1,489), the resultant full-length models have an RMSD to native below 6 A (97% of them below 4 A). On average, the RMSD of full-length models is 2.25 A, with aligned regions improved from 2.5 A to 1.88 A, comparable with the accuracy of low-resolution experimental structures. Furthermore, starting from state-of-the-art structural alignments, we demonstrate a methodology that can consistently bring template-based alignments closer to native. These results are highly suggestive that the protein-folding problem can in principle be solved based on the current PDB library by developing efficient fold recognition algorithms that can recover such initial alignments.

[1]  Yang Zhang,et al.  Large-scale assessment of the utility of low-resolution protein structures for biochemical function assignment , 2004, Bioinform..

[2]  J. Fetrow,et al.  Sequence- and structure-based protein function prediction from genomic information. , 2001, Current opinion in drug discovery & development.

[3]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[4]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[5]  J. Skolnick,et al.  TOUCHSTONE II: a new approach to ab initio protein structure prediction. , 2003, Biophysical journal.

[6]  Jacquelyn S. Fetrow,et al.  Structural genomics and its importance for gene function analysis , 2000, Nature Biotechnology.

[7]  B Honig,et al.  An integrated approach to the analysis and modeling of protein sequences and structures. I. Protein structural alignment and a quantitative measure for protein structural distance. , 2000, Journal of molecular biology.

[8]  A. Sali,et al.  Comparative protein structure modeling of genes and genomes. , 2000, Annual review of biophysics and biomolecular structure.

[9]  Andrew C. R. Martin,et al.  Assessment of comparative modeling in CASP2 , 1997, Proteins.

[10]  J F Gibrat,et al.  Surprising similarities in structure comparison. , 1996, Current opinion in structural biology.

[11]  Frances M. G. Pearl,et al.  Quantifying the similarities within fold space. , 2002, Journal of molecular biology.

[12]  Yang Zhang,et al.  SPICKER: A clustering approach to identify near‐native protein folds , 2004, J. Comput. Chem..

[13]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[14]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[15]  J. Skolnick,et al.  The PDB is a covering set of small protein structures. , 2003, Journal of molecular biology.

[16]  G. Fredrickson The theory of polymer dynamics , 1996 .

[17]  J. Skolnick,et al.  Assembly of protein structure from sparse experimental data: An efficient Monte Carlo model , 1998, Proteins.

[18]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[19]  M. James,et al.  A critical assessment of comparative molecular modeling of tertiary structures of proteins * , 1995, Proteins.

[20]  S. Bryant,et al.  Statistics of sequence-structure threading. , 1995, Current opinion in structural biology.

[21]  A. Sali,et al.  Protein Structure Prediction and Structural Genomics , 2001, Science.

[22]  David C. Jones,et al.  GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. , 1999, Journal of molecular biology.

[23]  Anna Tramontano,et al.  Assessment of homology‐based predictions in CASP5 , 2003, Proteins.

[24]  Mark Gerstein,et al.  Using Iterative Dynamic Programming to Obtain Accurate Pairwise and Multiple Alignments of Protein Structures , 1996, ISMB.

[25]  T. P. Flores,et al.  Multiple protein structure alignment , 1994, Protein science : a publication of the Protein Society.

[26]  J. Skolnick,et al.  Local energy landscape flattening: Parallel hyperbolic Monte Carlo sampling of protein folding , 2002, Proteins.

[27]  J. Skolnick,et al.  Finding the needle in a haystack: educing native folds from ambiguous ab initio protein structure predictions , 2001 .

[28]  C Sander,et al.  Mapping the Protein Universe , 1996, Science.

[29]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[30]  A. Sali,et al.  Modeling of loops in protein structures , 2000, Protein science : a publication of the Protein Society.

[31]  Z. Ou-Yang,et al.  Stretching single-stranded DNA: interplay of electrostatic, base-pairing, and base-pair stacking interactions. , 2001, Biophysical journal.

[32]  Alfred Wittinghofer,et al.  RanGAP mediates GTP hydrolysis without an arginine finger , 2002, Nature.

[33]  C. Sander,et al.  Dali: a network tool for protein structure comparison. , 1995, Trends in biochemical sciences.

[34]  J. Skolnick,et al.  Automated structure prediction of weakly homologous proteins on a genomic scale. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[35]  I D Kuntz,et al.  Peter Andrew Kollman , 2001, Proteins.

[36]  J P Glusker,et al.  X-ray crystallography of proteins. , 1994, Methods of biochemical analysis.

[37]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[38]  Timothy F. Havel,et al.  NMR structure determination in solution: a critique and comparison with X-ray crystallography. , 1992, Annual review of biophysics and biomolecular structure.

[39]  J. Skolnick,et al.  Development and large scale benchmark testing of the PROSPECTOR_3 threading algorithm , 2004, Proteins.

[40]  Matthias Rief,et al.  Elastically Coupled Two-Level Systems as a Model for Biopolymer Extensibility , 1998 .