A multi-template combination algorithm for protein comparative modeling

BackgroundMultiple protein templates are commonly used in manual protein structure prediction. However, few automated algorithms of selecting and combining multiple templates are available.ResultsHere we develop an effective multi-template combination algorithm for protein comparative modeling. The algorithm selects templates according to the similarity significance of the alignments between template and target proteins. It combines the whole template-target alignments whose similarity significance score is close to that of the top template-target alignment within a threshold, whereas it only takes alignment fragments from a less similar template-target alignment that align with a sizable uncovered region of the target.We compare the algorithm with the traditional method of using a single top template on the 45 comparative modeling targets (i.e. easy template-based modeling targets) used in the seventh edition of Critical Assessment of Techniques for Protein Structure Prediction (CASP7). The multi-template combination algorithm improves the GDT-TS scores of predicted models by 6.8% on average. The statistical analysis shows that the improvement is significant (p-value < 10-4). Compared with the ideal approach that always uses the best template, the multi-template approach yields only slightly better performance. During the CASP7 experiment, the preliminary implementation of the multi-template combination algorithm (FOLDpro) was ranked second among 67 servers in the category of high-accuracy structure prediction in terms of GDT-TS measure.ConclusionWe have developed a novel multi-template algorithm to improve protein comparative modeling.

[1]  D. Phillips,et al.  A possible three-dimensional structure of bovine alpha-lactalbumin based on that of hen's egg-white lysozyme. , 1969, Journal of molecular biology.

[2]  Michael J. E. Sternberg,et al.  Three-dimensional structural aspects of the design of new protein molecules , 1986, Philosophical Transactions of the Royal Society of London. Series A, Mathematical and Physical Sciences.

[3]  T. Blundell,et al.  Knowledge based modelling of homologous proteins, Part I: Three-dimensional frameworks derived from the simultaneous superposition of multiple structures. , 1987, Protein engineering.

[4]  T. L. Blundell,et al.  Knowledge-based prediction of protein structures and the design of novel molecules , 1987, Nature.

[5]  S. Wodak Computer‐Aided Design in Protein Engineering , 1987, Annals of the New York Academy of Sciences.

[6]  A. D. McLachlan,et al.  Profile analysis: detection of distantly related proteins. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[7]  S. Henikoff,et al.  Finding protein similarities with nucleotide sequence databases. , 1990, Methods in enzymology.

[8]  John P. Overington,et al.  Tertiary structural constraints on protein evolutionary diversity: templates, key residues and structure prediction , 1990, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[9]  J. Greer Comparative modeling methods: Application to the family of the mammalian serine proteases , 1990, Proteins.

[10]  M. Levitt Accurate modeling of protein conformation by automatic segment matching. , 1992, Journal of molecular biology.

[11]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[12]  D Gusfield,et al.  Efficient methods for multiple sequence alignment with guaranteed error bounds , 1993, Bulletin of mathematical biology.

[13]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[14]  P. Koehl,et al.  Application of a self-consistent mean field theory to predict protein side-chains conformation and estimate their conformational entropy. , 1994, Journal of molecular biology.

[15]  B. Rost,et al.  Combining evolutionary information and neural networks to predict protein secondary structure , 1994, Proteins.

[16]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[17]  T. P. Flores,et al.  Multiple protein structure alignment , 1994, Protein science : a publication of the Protein Society.

[18]  M. Karplus,et al.  Evaluation of comparative protein modeling by MODELLER , 1995, Proteins.

[19]  J F Gibrat,et al.  Surprising similarities in structure comparison. , 1996, Current opinion in structural biology.

[20]  N. Guex,et al.  SWISS‐MODEL and the Swiss‐Pdb Viewer: An environment for comparative protein modeling , 1997, Electrophoresis.

[21]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[22]  T. Hubbard,et al.  Critical assessment of methods of protein structure prediction (CASP): Round III , 1999, Proteins.

[23]  S. Bryant,et al.  Critical assessment of methods of protein structure prediction (CASP): Round II , 1997, Proteins.

[24]  Gapped BLAST and PSI-BLAST: A new , 1997 .

[25]  C. Chothia,et al.  Intermediate sequences increase the detection of homology between sequences. , 1997, Journal of molecular biology.

[26]  R Sánchez,et al.  Advances in comparative protein-structure modelling. , 1997, Current opinion in structural biology.

[27]  A. Sali 100,000 protein structures for the biologist , 1998, Nature Structural Biology.

[28]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[29]  Richard Hughey,et al.  Hidden Markov models for detecting remote protein homologies , 1998, Bioinform..

[30]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[31]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[32]  A. Sali,et al.  Modeling of loops in protein structures , 2000, Protein science : a publication of the Protein Society.

[33]  A. Sali,et al.  Comparative protein structure modeling of genes and genomes. , 2000, Annual review of biophysics and biomolecular structure.

[34]  A. Godzik,et al.  Comparison of sequence profiles. Strategies for structural predictions using sequence information , 2008, Protein science : a publication of the Protein Society.

[35]  M C Peitsch,et al.  Protein structure computing in the genomic era. , 2000, Research in microbiology.

[36]  H. Umeyama,et al.  An automatic homology modeling method consisting of database searches and simulated annealing. , 2000, Journal of molecular graphics & modelling.

[37]  B Honig,et al.  Combining multiple structure and sequence alignments to improve sequence detection and alignment: Application to the SH2 domains of Janus kinases , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[38]  Chris Sander,et al.  Completeness in structural genomics , 2001, Nature Structural Biology.

[39]  S. Brenner A tour of structural genomics , 2001, Nature Reviews Genetics.

[40]  T. Hubbard,et al.  Critical assessment of methods of protein structure prediction (CASP)‐round V , 2003, Proteins.

[41]  M J Sternberg,et al.  Enhancement of protein modeling by human intervention in applying the automatic programs 3D‐JIGSAW and 3D‐PSSM , 2001, Proteins.

[42]  R Leplae,et al.  Analysis and assessment of comparative modeling predictions in CASP4 , 2001, Proteins.

[43]  I D Kuntz,et al.  Peter Andrew Kollman , 2001, Proteins.

[44]  Andrej Sali Target practice , 2001, Nature Structural Biology.

[45]  P Rotkiewicz,et al.  Generalized comparative modeling (GENECOMP): A combination of sequence comparison, threading, and lattice modeling for protein structure prediction and refinement , 2001, Proteins.

[46]  Christophe G. Lambert,et al.  ESyPred3D: Prediction of proteins 3D structures , 2002, Bioinform..

[47]  Golan Yona,et al.  Within the twilight zone: a sensitive profile-profile comparison tool based on information theory. , 2002, Journal of molecular biology.

[48]  Ceslovas Venclovas,et al.  Comparative modeling in CASP5: Progress is evident, but alignment errors remain a significant hindrance , 2003, Proteins.

[49]  Tatiana A. Tatusova,et al.  NCBI Reference Sequence Project: update and current status , 2003, Nucleic Acids Res..

[50]  Adam Zemla,et al.  LGA: a method for finding 3D similarities in protein structures , 2003, Nucleic Acids Res..

[51]  Lei Xie,et al.  Using multiple structure alignments, fast model building, and energetic analysis in fold recognition and homology modeling , 2003, Proteins.

[52]  B. Rost,et al.  Critical assessment of methods of protein structure prediction (CASP)—Round 6 , 2005, Proteins.

[53]  Lars Malmström,et al.  Automated prediction of CASP‐5 structures using the Robetta server , 2003, Proteins.

[54]  Anna Tramontano,et al.  Assessment of homology‐based predictions in CASP5 , 2003, Proteins.

[55]  Manuel C. Peitsch,et al.  SWISS-MODEL: an automated protein homology-modeling server , 2003, Nucleic Acids Res..

[56]  Marcin von Grotthuss,et al.  ORFeus: detection of distant homology using sequence profiles and predicted secondary structure , 2003, Nucleic Acids Res..

[57]  Nick V. Grishin,et al.  Probabilistic scoring measures for profile-profile comparison yield more accurate short seed alignments , 2003, Bioinform..

[58]  N. Grishin,et al.  COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance. , 2003, Journal of molecular biology.

[59]  Zukang Feng,et al.  The Protein Data Bank and structural genomics , 2003, Nucleic Acids Res..

[60]  Ying Xu,et al.  Raptor: Optimal Protein Threading by Linear Programming , 2003, J. Bioinform. Comput. Biol..

[61]  Daniel Fischer,et al.  3DS3 and 3DS5 3D‐SHOTGUN meta‐predictors in CAFASP3 , 2003, Proteins.

[62]  David Baker,et al.  Protein structure prediction and analysis using the Robetta server , 2004, Nucleic Acids Res..

[63]  Arne Elofsson,et al.  Profile–profile methods provide improved fold‐recognition: A study of different profile–profile alignment methods , 2004, Proteins.

[64]  A. Sali,et al.  Alignment of protein sequences by their profiles , 2004, Protein science : a publication of the Protein Society.

[65]  Andrew E. Torda,et al.  Wurst: a protein threading server with a structural scoring function, sequence profiles and optimized substitution matrices , 2004, Nucleic Acids Res..

[66]  Kimmen Sjölander,et al.  COACH : profile-profile alignment of protein families using hidden Markov models , 2003 .

[67]  Hongyi Zhou,et al.  Quantifying the effect of burial of amino acid residues on protein stability , 2003, Proteins.

[68]  J. Skolnick,et al.  Automated structure prediction of weakly homologous proteins on a genomic scale. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[69]  Arne Elofsson,et al.  Using evolutionary information for the query and target improves fold recognition , 2004, Proteins.

[70]  J. Skolnick,et al.  Development and large scale benchmark testing of the PROSPECTOR_3 threading algorithm , 2004, Proteins.

[71]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[72]  B. Honig,et al.  Protein structure prediction: inroads to biology. , 2005, Molecular cell.

[73]  Hongyi Zhou,et al.  Fold recognition by combining sequence profiles derived from evolution and from depth‐dependent structural alignment of fragments , 2004, Proteins.

[74]  Č. Venclovas,et al.  Comparative modeling in CASP6 using consensus approach to template selection, sequence‐structure alignment, and structure assessment , 2005, Proteins.

[75]  Yang Zhang,et al.  TASSER: An automated method for the prediction of protein tertiary structures in CASP6 , 2005, Proteins.

[76]  Ming Li,et al.  Consensus fold recognition by predicted model quality , 2005, APBC.

[77]  Arne Elofsson,et al.  All are not equal: A benchmark of different homology modeling programs , 2005, Protein science : a publication of the Protein Society.

[78]  Alfonso Valencia,et al.  Assessment of predictions submitted for the CASP6 comparative modeling category , 2005, Proteins.

[79]  Arne Elofsson,et al.  Pcons5: combining consensus, structural evaluation and fold recognition scores , 2005, Bioinform..

[80]  Yang Zhang,et al.  The protein structure prediction problem could be solved using the current PDB library. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[81]  Johannes Söding,et al.  Protein homology detection by HMM?CHMM comparison , 2005, Bioinform..

[82]  Paul Kearney,et al.  The use of functional domains to improve transmembrane protein topology prediction. , 2005 .

[83]  Yang Zhang,et al.  TASSER-Lite: an automated tool for protein comparative modeling. , 2006, Biophysical journal.

[84]  Aleksandar Poleksic,et al.  STRUCTFAST: Protein sequence remote homology detection and alignment using novel dynamic programming and profile–profile scoring , 2006, Proteins.

[85]  Pierre Baldi,et al.  A machine learning information retrieval approach to protein fold recognition. , 2006, Bioinformatics.

[86]  David E. Kim,et al.  Physically realistic homology models built with ROSETTA can be more accurate than their templates. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[87]  J. Skolnick,et al.  Ab initio modeling of small proteins by iterative TASSER simulations , 2007, BMC Biology.

[88]  Keehyoung Joo,et al.  High accuracy template based modeling by global optimization , 2007, Proteins.

[89]  Randy J Read,et al.  Automated server predictions in CASP7 , 2007, Proteins.

[90]  Randy J Read,et al.  Assessment of CASP7 predictions in the high accuracy template‐based modeling category , 2007, Proteins.

[91]  Yang Zhang,et al.  Template‐based modeling and free modeling by I‐TASSER in CASP7 , 2007, Proteins.

[92]  Lars Malmström,et al.  Structure prediction for CASP7 targets using extensive all‐atom refinement with Rosetta@home , 2007, Proteins.

[93]  Seung Yup Lee,et al.  Analysis of TASSER‐based CASP7 protein structure prediction results , 2007, Proteins.

[94]  Sitao Wu,et al.  LOMETS: A local meta-threading-server for protein structure prediction , 2007, Nucleic acids research.

[95]  Torsten Schwede,et al.  Automated comparative protein structure modeling with SWISS‐MODEL and Swiss‐PdbViewer: A historical perspective , 2009, Electrophoresis.

[96]  H. Mohr,et al.  A Critical Assessment , 1985, The Federal Estate Tax.