SP5: Improving Protein Fold Recognition by Using Torsion Angle Profiles and Profile-Based Gap Penalty Model

How to recognize the structural fold of a protein is one of the challenges in protein structure prediction. We have developed a series of single (non-consensus) methods (SPARKS, SP2, SP3, SP4) that are based on weighted matching of two to four sequence and structure-based profiles. There is a robust improvement of the accuracy and sensitivity of fold recognition as the number of matching profiles increases. Here, we introduce a new profile-profile comparison term based on real-value dihedral torsion angles. Together with updated real-value solvent accessibility profile and a new variable gap-penalty model based on fractional power of insertion/deletion profiles, the new method (SP5) leads to a robust improvement over previous SP method. There is a 2% absolute increase (5% relative improvement) in alignment accuracy over SP4 based on two independent benchmarks. Moreover, SP5 makes 7% absolute increase (22% relative improvement) in success rate of recognizing correct structural folds, and 32% relative improvement in model accuracy of models within the same fold in Lindahl benchmark. In addition, modeling accuracy of top-1 ranked models is improved by 12% over SP4 for the difficult targets in CASP 7 test set. These results highlight the importance of harnessing predicted structural properties in challenging remote-homolog recognition. The SP5 server is available at http://sparks.informatics.iupui.edu.

[1]  C. Chothia The nature of the accessible and buried surfaces in proteins. , 1976, Journal of molecular biology.

[2]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[3]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[4]  M Levitt,et al.  Alignment of the amino acid sequences of distantly related proteins using variable gap penalties. , 1986, Protein engineering.

[5]  B. Lee,et al.  Estimation and use of protein backbone angle probabilities. , 1993, Journal of molecular biology.

[6]  C. Sander,et al.  The FSSP database of structurally aligned protein fold families. , 1994, Nucleic acids research.

[7]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[8]  A Elofsson,et al.  Assessing the performance of fold recognition methods by means of a comprehensive benchmark. , 1996, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[9]  S. Bryant,et al.  Critical assessment of methods of protein structure prediction (CASP): Round II , 1997, Proteins.

[10]  Gapped BLAST and PSI-BLAST: A new , 1997 .

[11]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[12]  Richard Hughey,et al.  Hidden Markov models for detecting remote protein homologies , 1998, Bioinform..

[13]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[14]  C. Etchebest,et al.  Bayesian probabilistic approach for predicting backbone structures in terms of protein blocks , 2000, Proteins.

[15]  M J Sippl,et al.  Structure-based evaluation of sequence comparison and fold recognition alignment accuracy. , 2000, Journal of molecular biology.

[16]  E. Lindahl,et al.  Identification of related proteins on family, superfamily and fold level. , 2000, Journal of molecular biology.

[17]  V. Thorsson,et al.  HMMSTR: a hidden Markov model for local sequence-structure correlations in proteins. , 2000, Journal of molecular biology.

[18]  Arne Elofsson,et al.  MaxSub: an automated measure for the assessment of protein structure prediction quality , 2000, Bioinform..

[19]  T L Blundell,et al.  FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. , 2001, Journal of molecular biology.

[20]  J. Skolnick,et al.  The PDB is a covering set of small protein structures. , 2003, Journal of molecular biology.

[21]  Dong Xu,et al.  PROSPECT II: protein structure prediction program for genome-scale applications. , 2003, Protein engineering.

[22]  Lars Malmström,et al.  Automated prediction of CASP‐5 structures using the Robetta server , 2003, Proteins.

[23]  Jinbo Xu Protein Structure Prediction by Linear Programming , 2003 .

[24]  Teruyoshi Hishiki,et al.  Indexing anatomical concepts to OMIM Clinical Synopsis using UMLS Metathesaurus , 2003, Silico Biol..

[25]  K. Karplus,et al.  Hidden Markov models that use predicted local structure for fold recognition: Alphabets of backbone geometry , 2003, Proteins.

[26]  An-Suei Yang,et al.  Protein backbone angle prediction with machine learning approaches , 2004, Bioinform..

[27]  Byungkook Lee,et al.  Frequency of gaps observed in a structurally aligned protein pair database suggests a simple gap penalty function. , 2004, Nucleic acids research.

[28]  A. Sali,et al.  Alignment of protein sequences by their profiles , 2004, Protein science : a publication of the Protein Society.

[29]  Hongyi Zhou,et al.  Single‐body residue‐level knowledge‐based energy score combined with sequence‐profile and secondary structure information for fold recognition , 2004, Proteins.

[30]  Serge A. Hazout,et al.  Local backbone structure prediction of proteins , 2004, Silico Biol..

[31]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[32]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[33]  Johannes Söding,et al.  The HHpred interactive server for protein homology detection and structure prediction , 2005, Nucleic Acids Res..

[34]  Hongyi Zhou,et al.  Fold recognition by combining sequence profiles derived from evolution and from depth‐dependent structural alignment of fragments , 2004, Proteins.

[35]  Aleksey A. Porollo,et al.  Combining prediction of secondary structure and solvent accessibility in proteins , 2005, Proteins.

[36]  Yaoqi Zhou,et al.  SPARKS 2 and SP3 servers in CASP6 , 2005, Proteins.

[37]  Alfonso Valencia,et al.  Assessment of predictions submitted for the CASP6 comparative modeling category , 2005, Proteins.

[38]  J. Skolnick,et al.  On the origin and highly likely completeness of single-domain protein structures. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[39]  Andrej Sali,et al.  Variable gap penalty for protein sequence-structure alignment. , 2006, Protein engineering, design & selection : PEDS.

[40]  Ulrich H. E. Hansmann,et al.  Bioinformatics Original Paper Support Vector Machines for Prediction of Dihedral Angle Regions , 2022 .

[41]  Pierre Baldi,et al.  A machine learning information retrieval approach to protein fold recognition. , 2006, Bioinformatics.

[42]  Christopher Bystroff,et al.  Improved pairwise alignment of proteins in the Twilight Zone using local structure predictions , 2005, 2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05).

[43]  Alessandro Vullo,et al.  Protein Structural Motif Prediction in Multidimensional ø-Psi Space Leads to Improved Secondary Structure Prediction , 2006, J. Comput. Biol..

[44]  Roland L. Dunbrack Sequence comparison and protein structure prediction. , 2006, Current opinion in structural biology.

[45]  Ying Xu,et al.  Improvement in protein sequence-structure alignment using insertion/deletion frequency arrays. , 2007, Computational systems bioinformatics. Computational Systems Bioinformatics Conference.

[46]  Yaoqi Zhou,et al.  Achieving 80% ten‐fold cross‐validated accuracy for secondary structure prediction by large‐scale training , 2006, Proteins.

[47]  Randy J Read,et al.  Automated server predictions in CASP7 , 2007, Proteins.

[48]  Arne Elofsson,et al.  Pcons.net: protein structure prediction meta server , 2007, Nucleic Acids Res..

[49]  Anna Tramontano,et al.  Critical assessment of methods of protein structure prediction—Round VII , 2007, Proteins.

[50]  Song Liu,et al.  Fold recognition by concurrent use of solvent accessibility and residue depth , 2007, Proteins.

[51]  Genki Terashi,et al.  Fams‐ace: A combined method to select the best model after remodeling all server models , 2007, Proteins.

[52]  Yang Zhang,et al.  Template‐based modeling and free modeling by I‐TASSER in CASP7 , 2007, Proteins.

[53]  Yaoqi Zhou,et al.  DDOMAIN: Dividing structures into domains using a normalized domain–domain interaction profile , 2007, Protein science : a publication of the Protein Society.

[54]  Seung Yup Lee,et al.  Analysis of TASSER‐based CASP7 protein structure prediction results , 2007, Proteins.

[55]  Yaoqi Zhou,et al.  Real‐SPINE: An integrated system of neural networks for real‐value prediction of protein structural properties , 2007, Proteins.

[56]  Bin Xue,et al.  Real‐value prediction of backbone torsion angles , 2008, Proteins.

[57]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .