Single‐body residue‐level knowledge‐based energy score combined with sequence‐profile and secondary structure information for fold recognition

An elaborate knowledge‐based energy function is designed for fold recognition. It is a residue‐level single‐body potential so that highly efficient dynamic programming method can be used for alignment optimization. It contains a backbone torsion term, a buried surface term, and a contact‐energy term. The energy score combined with sequence profile and secondary structure information leads to an algorithm called SPARKS (Sequence, secondary structure Profiles and Residue‐level Knowledge‐based energy Score) for fold recognition. Compared with the popular PSI‐BLAST, SPARKS is 21% more accurate in sequence‐sequence alignment in ProSup benchmark and 10%, 25%, and 20% more sensitive in detecting the family, superfamily, fold similarities in the Lindahl benchmark, respectively. Moreover, it is one of the best methods for sensitivity (the number of correctly recognized proteins), alignment accuracy (based on the MaxSub score), and specificity (the average number of correctly recognized proteins whose scores are higher than the first false positives) in LiveBench 7 among more than twenty servers of non‐consensus methods. The simple algorithm used in SPARKS has the potential for further improvement. This highly efficient method can be used for fold recognition on genomic scales. A web server is established for academic users on http://theory.med.buffalo.edu. Proteins 2004. © 2004 Wiley‐Liss, Inc.

[1]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[2]  M. O. Dayhoff,et al.  Establishing homologies in protein sequences. , 1983, Methods in enzymology.

[3]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[4]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[5]  D. Eisenberg,et al.  A method to identify protein sequences that fold into a known three-dimensional structure. , 1991, Science.

[6]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[7]  D. T. Jones,et al.  A new approach to protein fold recognition , 1992, Nature.

[8]  A. Godzik,et al.  Sequence-structure matching in globular proteins: application to supersecondary and tertiary structure determination. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[9]  U. Hobohm,et al.  Selection of representative protein data sets , 1992, Protein science : a publication of the Protein Society.

[10]  S. Bryant,et al.  An empirical energy function for threading protein sequence through the folding motif , 1993, Proteins.

[11]  M S Waterman,et al.  Sequence alignment and penalty choice. Review of concepts, case studies and implications. , 1994, Journal of molecular biology.

[12]  R. Abagyan,et al.  Recognition of distantly related proteins through energy calculations , 1994, Proteins.

[13]  E S Lander,et al.  Recognition of related proteins by iterative template refinement (ITR) , 1994, Protein science : a publication of the Protein Society.

[14]  R. Lathrop The protein threading problem with sequence amino acid interaction preferences is NP-complete. , 1994, Protein engineering.

[15]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[16]  M. Levitt,et al.  Energy functions that discriminate X-ray and near native folds from well-constructed decoys. , 1996, Journal of molecular biology.

[17]  Scott M. Le Grand,et al.  A study of combined structure/sequence profiles. , 1996, Folding & design.

[18]  D. Fischer,et al.  Protein fold recognition using sequence‐derived predictions , 1996, Protein science : a publication of the Protein Society.

[19]  Richard A. Goldstein,et al.  THE STATISTICAL MECHANICAL BASIS OF SEQUENCE ALIGNMENT ALGORITHMS FOR PROTEIN STRUCTURE RECOGNITION , 1996 .

[20]  A Elofsson,et al.  Assessing the performance of fold recognition methods by means of a comprehensive benchmark. , 1996, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[21]  Temple F. Smith,et al.  Global optimum protein threading with gapped alignment and empirical pair score functions. , 1996, Journal of molecular biology.

[22]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[23]  Michael Gribskov,et al.  Score Distributions for Simultaneous Matching to Multiple Motifs , 1997, J. Comput. Biol..

[24]  C Kooperberg,et al.  Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. , 1997, Journal of molecular biology.

[25]  B. Rost,et al.  Protein fold recognition by prediction-based threading. , 1997, Journal of molecular biology.

[26]  David C. Jones,et al.  Progress in protein structure prediction. , 1997, Current opinion in structural biology.

[27]  A G Murzin,et al.  Distant homology recognition using structural classification of proteins , 1997, Proteins.

[28]  A E Torda,et al.  Perspectives in protein-fold recognition. , 1997, Current opinion in structural biology.

[29]  R. Samudrala,et al.  An all-atom distance-dependent conditional probability discriminatory function for protein structure prediction. , 1998, Journal of molecular biology.

[30]  Leszek Rychlewski,et al.  Fold prediction by a hierarchy of sequence, threading, and modeling methods , 1998, Protein science : a publication of the Protein Society.

[31]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[32]  Richard Hughey,et al.  Hidden Markov models for detecting remote protein homologies , 1998, Bioinform..

[33]  J. Thornton,et al.  Factors limiting the performance of prediction‐based fold recognition methods , 2008, Protein science : a publication of the Protein Society.

[34]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[35]  David C. Jones,et al.  GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. , 1999, Journal of molecular biology.

[36]  D Fischer,et al.  Hybrid fold recognition: combining sequence derived properties with evolutionary information. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[37]  A. Panchenko,et al.  Combination of threading potentials and sequence profiles improves fold recognition. , 2000, Journal of molecular biology.

[38]  M J Sippl,et al.  Structure-based evaluation of sequence comparison and fold recognition alignment accuracy. , 2000, Journal of molecular biology.

[39]  E V Koonin,et al.  Protein fold recognition using sequence profiles and its application in structural genomics. , 2000, Advances in protein chemistry.

[40]  E. Lindahl,et al.  Identification of related proteins on family, superfamily and fold level. , 2000, Journal of molecular biology.

[41]  S. Henikoff,et al.  Amino acid substitution matrices. , 2000, Advances in protein chemistry.

[42]  Y Xu,et al.  Protein threading using PROSPECT: Design and evaluation , 2000, Proteins.

[43]  M. Sternberg,et al.  Enhanced genome annotation using structural profiles in the program 3D-PSSM. , 2000, Journal of molecular biology.

[44]  A. Sali,et al.  Comparative protein structure modeling of genes and genomes. , 2000, Annual review of biophysics and biomolecular structure.

[45]  I W Hunter,et al.  3D-1D threading methods for protein fold recognition. , 2000, Pharmacogenomics.

[46]  A. Godzik,et al.  Comparison of sequence profiles. Strategies for structural predictions using sequence information , 2008, Protein science : a publication of the Protein Society.

[47]  R Samudrala,et al.  Ab initio construction of protein tertiary structures using a hierarchical approach. , 2000, Journal of molecular biology.

[48]  R. Elber,et al.  Distance‐dependent, pair potential for protein folding: Results from linear optimization , 2000, Proteins.

[49]  Arne Elofsson,et al.  MaxSub: an automated measure for the assessment of protein structure prediction quality , 2000, Bioinform..

[50]  B Honig,et al.  Combining multiple structure and sequence alignments to improve sequence detection and alignment: Application to the SH2 domains of Janus kinases , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[51]  T L Blundell,et al.  FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. , 2001, Journal of molecular biology.

[52]  J. Skolnick,et al.  A distance‐dependent atomic knowledge‐based potential for improved protein structure selection , 2001, Proteins.

[53]  R B Russell,et al.  Fold recognition from sequence comparisons , 2001, Proteins.

[54]  Y Shan,et al.  Fold recognition and accurate query‐template alignment by a combination of PSI‐BLAST and threading , 2001, Proteins.

[55]  J Skolnick,et al.  Defrosting the frozen approximation: PROSPECTOR— A new approach to threading , 2001, Proteins.

[56]  D Fischer,et al.  LiveBench‐1: Continuous benchmarking of protein structure prediction servers , 2001, Protein science : a publication of the Protein Society.

[57]  D Fischer,et al.  LiveBench‐2: Large‐scale automated evaluation of protein structure prediction servers , 2001, Proteins.

[58]  M J Sippl,et al.  Assessment of the CASP4 fold recognition category , 2001, Proteins.

[59]  Golan Yona,et al.  Within the twilight zone: a sensitive profile-profile comparison tool based on information theory. , 2002, Journal of molecular biology.

[60]  R. Elber,et al.  Protein Recognition by Sequence‐to‐Structure Fitness: Bridging Efficiency and Capacity of Threading Models , 2002 .

[61]  Bin Qian,et al.  Optimization of a new score function for the generation of accurate alignments , 2002, Proteins.

[62]  Hongyi Zhou,et al.  Distance‐scaled, finite ideal‐gas reference state improves structure‐derived potentials of mean force for structure selection and stability prediction , 2002, Protein science : a publication of the Protein Society.

[63]  Hongyi Zhou,et al.  Stability scale and atomic solvation parameters extracted from 1023 mutation experiments , 2002, Proteins.

[64]  A. Sali,et al.  Statistical potentials for fold assessment , 2009 .

[65]  D. Fischer,et al.  LiveBench‐6: Large‐scale automated evaluation of protein structure prediction servers , 2003, Proteins.

[66]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[67]  Fan Jiang,et al.  Prediction of protein secondary structure with a reliability score estimated by local sequence clustering. , 2003, Protein engineering.

[68]  B. Rost,et al.  Critical assessment of methods of protein structure prediction (CASP)—Round 6 , 2005, Proteins.

[69]  Dong Xu,et al.  PROSPECT II: protein structure prediction program for genome-scale applications. , 2003, Protein engineering.

[70]  Daniel Fischer,et al.  3D‐SHOTGUN: A novel, cooperative, fold‐recognition meta‐predictor , 2003, Proteins.

[71]  Ying Xu,et al.  Raptor: Optimal Protein Threading by Linear Programming , 2003, J. Bioinform. Comput. Biol..

[72]  Hongyi Zhou,et al.  Quantifying the effect of burial of amino acid residues on protein stability , 2003, Proteins.

[73]  Adam Zemla,et al.  Critical assessment of methods of protein structure prediction (CASP)‐round V , 2005, Proteins.