Fold recognition with minimal gaps

Here we present a simplified form of threading that uses only a 20 × 20 two‐body residue‐based potential and restricted number of gaps. Despite its simplicity and transparency the Monte Carlo‐based threading algorithm performs very well in a rigorous test of fold recognition. The results suggest that by simplifying and constraining the decoy space, one can achieve better fold recognition. Fold recognition results are compared with and supplemented by a PSI‐BLAST search. The statistical significance of threading results is rigorously evaluated from statistics of extremes by comparison with optimal alignments of a large set of randomly shuffled sequences. The statistical theory, based on the Random Energy Model, yields a cumulative statistical parameter, ϵ, that attests to the likelihood of correct fold recognition. A large ϵ indicates a significant energy gap between the optimal alignment and decoy alignments and, consequently, a high probability that the fold is correctly recognized. For a particular number of gaps, the ϵ parameter reaches its maximal value, and the fold is recognized. As the number of gaps further increases, the likelihood of correct fold recognition drops off. This is because the decoy space is small when gaps are restricted to a small number, but the native alignment is still well approximated, whereas unrestricted increase of the number of gaps leads to rapid growth of the number of decoys and their statistical dominance over the correct alignment. It is shown that best results are obtained when a combination of one‐, two‐, and three‐gap threading is used. To this end, use of the ϵ parameter is crucial for rigorous comparison of results across the different decoy spaces belonging to a different number of gaps. Proteins 2003;51:531–543. © 2003 Wiley‐Liss, Inc.

[1]  B. Derrida Random-Energy Model: Limit of a Family of Disordered Models , 1980 .

[2]  P. Wolynes,et al.  Spin glasses and the statistical mechanics of protein folding. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[3]  E. Shakhnovich,et al.  Formation of unique structure in polypeptide chains. Theoretical investigation with the aid of a replica approach. , 1989, Biophysical chemistry.

[4]  D. Eisenberg,et al.  A method to identify protein sequences that fold into a known three-dimensional structure. , 1991, Science.

[5]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[6]  R. Lathrop The protein threading problem with sequence amino acid interaction preferences is NP-complete. , 1994, Protein engineering.

[7]  S. Bryant,et al.  Statistics of sequence-structure threading. , 1995, Current opinion in structural biology.

[8]  L A Mirny,et al.  How to derive a protein folding potential? A new approach to an old problem. , 1996, Journal of molecular biology.

[9]  Pande,et al.  Is heteropolymer freezing well described by the random energy model? , 1996, Physical review letters.

[10]  S. Bryant Evaluation of threading specificity and accuracy , 1996, Proteins.

[11]  D. Fischer,et al.  Protein fold recognition using sequence‐derived predictions , 1996, Protein science : a publication of the Protein Society.

[12]  R. Jernigan,et al.  Residue-residue potentials with a favorable contact pair term and an unfavorable high packing density term, for simulation and threading. , 1996, Journal of molecular biology.

[13]  Temple F. Smith,et al.  Global optimum protein threading with gapped alignment and empirical pair score functions. , 1996, Journal of molecular biology.

[14]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[15]  C Kooperberg,et al.  Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. , 1997, Journal of molecular biology.

[16]  S H Bryant,et al.  A measure of success in fold recognition. , 1997, Trends in biochemical sciences.

[17]  A V Finkelstein,et al.  Protein structure: what is it possible to predict now? , 1997, Current opinion in structural biology.

[18]  L. Mirny,et al.  Protein structure prediction by threading. Why it works and why it does not. , 1998, Journal of molecular biology.

[19]  M. Levitt,et al.  A unified statistical framework for sequence comparison and structure comparison. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Alexei V. Finkelstein,et al.  3D Protein Folds: Homologs Against Errors-a Simple Estimate Based on the Random Energy Model , 1998 .

[21]  Roland L. Dunbrack,et al.  Comparative modeling of CASP3 targets using PSI‐BLAST and SCWRL , 1999, Proteins.

[22]  A. Panchenko,et al.  Threading with explicit models for evolutionary conservation of structure and sequence , 1999, Proteins.

[23]  J. Skolnick,et al.  Averaging interaction energies over homologs improves protein fold recognition in gapless threading , 1999, Proteins.

[24]  S. Bryant,et al.  Identification of homologous core structures , 1999, Proteins.

[25]  David C. Jones,et al.  GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. , 1999, Journal of molecular biology.

[26]  J. Skolnick,et al.  Ab initio folding of proteins using restraints derived from evolutionary information , 1999, Proteins.

[27]  Eytan Domany,et al.  Protein folding in contact map space , 2000 .

[28]  A. Panchenko,et al.  Combination of threading potentials and sequence profiles improves fold recognition. , 2000, Journal of molecular biology.

[29]  Sarah A. Teichmann,et al.  Fast assignment of protein structures to sequences using the Intermediate Sequence Library PDB-ISL , 2000, Bioinform..

[30]  H. Margalit,et al.  Evaluation of PSI‐BLAST alignment accuracy in comparison to structural alignments , 2000, Protein science : a publication of the Protein Society.

[31]  A. Godzik,et al.  Comparison of sequence profiles. Strategies for structural predictions using sequence information , 2008, Protein science : a publication of the Protein Society.

[32]  L A Mirny,et al.  Statistical significance of protein structure prediction by threading. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Free energy self-averaging in protein-sized random heteropolymers. , 2001, Physical review letters.

[34]  Y Shan,et al.  Fold recognition and accurate query‐template alignment by a combination of PSI‐BLAST and threading , 2001, Proteins.

[35]  J Meller,et al.  Linear programming optimization and a double statistical filter for protein threading protocols , 2001, Proteins.

[36]  Eugene I. Shakhnovich,et al.  A structure-based method for derivation of all-atom potentials for protein folding , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[37]  B. Rost,et al.  Critical assessment of methods of protein structure prediction (CASP)—Round 6 , 2005, Proteins.