Recognizing protein substructure similarity using segmental threading.

Protein template identification is essential to protein structure and function predictions. However, conventional whole-chain threading approaches often fail to recognize conserved substructure motifs when the target and templates do not share the same fold. We developed a new approach, SEGMER, for identifying protein substructure similarities by segmental threading. The target sequence is split into segments of two to four consecutive or nonconsecutive secondary structural elements, which are then threaded through PDB to identify appropriate substructure motifs. SEGMER is tested on 144 nonredundant hard proteins. When combined with whole-chain threading, the TM-score of alignments and accuracy of spatial restraints of SEGMER increase by 16% and 25%, respectively, compared with that by the whole-chain threading methods only. When tested on 12 free modeling targets from CASP8, SEGMER increases the TM-score and contact accuracy by 28% and 48%, respectively. This significant improvement should have important impact on protein structure modeling and functional inference.

[1]  J. Skolnick,et al.  TOUCHSTONE II: a new approach to ab initio protein structure prediction. , 2003, Biophysical journal.

[2]  David E. Kim,et al.  Physically realistic homology models built with ROSETTA can be more accurate than their templates. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Torgeir R. Hvidsten,et al.  Using multi-data hidden Markov models trained on local neighborhoods of protein structure to predict residue-residue contacts , 2009, Bioinform..

[4]  Johannes Söding,et al.  The HHpred interactive server for protein homology detection and structure prediction , 2005, Nucleic Acids Res..

[5]  P. Argos,et al.  Knowledge‐based protein secondary structure assignment , 1995, Proteins.

[6]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[7]  Yang Zhang,et al.  Template‐based modeling and free modeling by I‐TASSER in CASP7 , 2007, Proteins.

[8]  Yang Zhang,et al.  Scoring function for automated assessment of protein structure template quality , 2004, Proteins.

[9]  Michal Brylinski,et al.  The continuity of protein structure space is an intrinsic property of proteins , 2009, Proceedings of the National Academy of Sciences.

[10]  C. Chothia Proteins. One thousand families for the molecular biologist. , 1992, Nature.

[11]  Sitao Wu,et al.  ANGLOR: A Composite Machine-Learning Algorithm for Protein Backbone Torsion Angle Prediction , 2008, PloS one.

[12]  J. Skolnick,et al.  Automated structure prediction of weakly homologous proteins on a genomic scale. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Jianlin Cheng A multi-template combination algorithm for protein comparative modeling , 2008, BMC Structural Biology.

[14]  Sitao Wu,et al.  MUSTER: Improving protein sequence profile–profile alignments by using multiple sources of structure information , 2008, Proteins.

[15]  S. Henikoff,et al.  Position-based sequence weights. , 1994, Journal of molecular biology.

[16]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[17]  Janet M. Thornton,et al.  Protein domain superfolds and superfamilies , 1994 .

[18]  K. R. Woods,et al.  Prediction of protein antigenic determinants from amino acid sequences. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Nick V Grishin,et al.  Discrete-continuous duality of protein structure space. , 2009, Current opinion in structural biology.

[20]  J. Skolnick,et al.  Ab initio protein structure prediction using chunk-TASSER. , 2007, Biophysical journal.

[21]  Sitao Wu,et al.  LOMETS: A local meta-threading-server for protein structure prediction , 2007, Nucleic acids research.

[22]  D. Eisenberg,et al.  A method to identify protein sequences that fold into a known three-dimensional structure. , 1991, Science.

[23]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[24]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[25]  Frances M. G. Pearl,et al.  Quantifying the similarities within fold space. , 2002, Journal of molecular biology.

[26]  B. Rost Twilight zone of protein sequence alignments. , 1999, Protein engineering.

[27]  Lars Malmström,et al.  Structure prediction for CASP7 targets using extensive all‐atom refinement with Rosetta@home , 2007, Proteins.

[28]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[29]  Torsten Schwede,et al.  Assessment of CASP7 predictions for template‐based modeling targets , 2007, Proteins.

[30]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[31]  Yang Zhang,et al.  I‐TASSER: Fully automated protein structure prediction in CASP8 , 2009, Proteins.

[32]  J. Skolnick,et al.  On the origin and highly likely completeness of single-domain protein structures. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[33]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[34]  M. Levitt Nature of the protein universe , 2009, Proceedings of the National Academy of Sciences.

[35]  Annabel E. Todd,et al.  Evolution of function in protein superfamilies, from a structural perspective. , 2001, Journal of molecular biology.

[36]  Krzysztof Fidelis,et al.  Local descriptors of protein structure: A systematic analysis of the sequence‐structure relationship in proteins using short‐ and long‐range interactions , 2009, Proteins.

[37]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[38]  C. Chothia One thousand families for the molecular biologist , 1992, Nature.

[39]  Roland L Dunbrack,et al.  Assessment of fold recognition predictions in CASP6 , 2005, Proteins.

[40]  J. Skolnick,et al.  Ab initio modeling of small proteins by iterative TASSER simulations , 2007, BMC Biology.

[41]  R. Varadarajan,et al.  Residue depth: a novel parameter for the analysis of protein structure and stability. , 1999, Structure.

[42]  Huan‐Xiang Zhou,et al.  Prediction of solvent accessibility and sites of deleterious mutations from protein sequence , 2005, Nucleic acids research.

[43]  Yang Zhang,et al.  The protein structure prediction problem could be solved using the current PDB library. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[44]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[45]  Pedro J Silva Assessing the reliability of sequence similarities detected through hydrophobic cluster analysis , 2008, Proteins.

[46]  Yang Zhang Progress and challenges in protein structure prediction. , 2008, Current opinion in structural biology.

[47]  C Kooperberg,et al.  Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. , 1997, Journal of molecular biology.

[48]  D. T. Jones,et al.  A new approach to protein fold recognition , 1992, Nature.

[49]  Daniel Fischer,et al.  3D‐SHOTGUN: A novel, cooperative, fold‐recognition meta‐predictor , 2003, Proteins.

[50]  B Honig,et al.  An integrated approach to the analysis and modeling of protein sequences and structures. I. Protein structural alignment and a quantitative measure for protein structural distance. , 2000, Journal of molecular biology.