Recursive protein modeling: A divide and conquer strategy for protein structure prediction and its case study in CASP9

After decades of research, protein structure prediction remains a very challenging problem. In order to address the different levels of complexity of modeling structure, two types of modeling techniques — template-based modeling and template-free modeling — have been developed. Template-based modeling can often generate a moderate to high resolution model when a similar, homologous template structure is found for a query protein but fails if no template or only incorrect templates are found. Template-free modeling such as fragment-based assembly may generate models of moderate resolution for small proteins of low topological complexity. Seldom have the two techniques been integrated together to improve protein modeling. Here we develop a recursive protein modeling approach to selectively and collaboratively apply template-based and template-free modeling methods to model template-covered (i.e., certain) and template-free (i.e., uncertain) regions of a protein. A preliminary implementation of the approach was tested on a number of hard modeling cases during the 9th Critical Assessment of Techniques for Protein Structure Prediction (CASP9) and successfully improved the quality of modeling in most of these cases. Recursive modeling can significantly reduce the complexity of protein structure modeling and integrate template-based and template-free modeling to improve the quality and efficiency of protein structure prediction.

[1]  Johannes Söding,et al.  Protein homology detection by HMM?CHMM comparison , 2005, Bioinform..

[2]  Søren Brunak,et al.  Protein structure by distance analysis , 1994 .

[3]  Pascal Benkert,et al.  QMEAN: A comprehensive scoring function for model quality assessment , 2008, Proteins.

[4]  K. Dill Dominant forces in protein folding. , 1990, Biochemistry.

[5]  N. Grishin,et al.  CASP9 target classification , 2011, Proteins.

[6]  Adam Zemla,et al.  LGA: a method for finding 3D similarities in protein structures , 2003, Nucleic Acids Res..

[7]  Johannes Söding,et al.  The HHpred interactive server for protein homology detection and structure prediction , 2005, Nucleic Acids Res..

[8]  N. Grishin,et al.  COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance. , 2003, Journal of molecular biology.

[9]  Richard Hughey,et al.  Hidden Markov models for detecting remote protein homologies , 1998, Bioinform..

[10]  J Lundström,et al.  Pcons: A neural‐network–based consensus predictor that improves fold recognition , 2001, Protein science : a publication of the Protein Society.

[11]  Jianlin Cheng,et al.  A conformation ensemble approach to protein residue-residue contact , 2011, BMC Structural Biology.

[12]  SödingJohannes Protein homology detection by HMM--HMM comparison , 2005 .

[13]  Jianlin Cheng A multi-template combination algorithm for protein comparative modeling , 2008, BMC Structural Biology.

[14]  Ashley Deacon,et al.  Structural genomics: from genes to structures with valuable materials and many questions in between , 2008, Nature Methods.

[15]  Jianlin Cheng,et al.  Structural Bioinformatics MULTICOM : A Multi-Level Combination Approach to Protein Structure Prediction and its Assessments in CASP 8 , 2010 .

[16]  Adrian A Canutescu,et al.  Cyclic coordinate descent: A robotics algorithm for protein loop closure , 2003, Protein science : a publication of the Protein Society.

[17]  Liam J. McGuffin,et al.  Improving sequence-based fold recognition by using 3D model quality assessment , 2005, Bioinform..

[18]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[19]  Jianlin Cheng,et al.  MULTICOM: a multi-level combination approach to protein structure prediction and its assessments in CASP8 , 2010, Bioinform..

[20]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[21]  Iakes Ezkurdia,et al.  Target domain definition and classification in CASP8 , 2009, Proteins.

[22]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[23]  Jianlin Cheng,et al.  Evaluating the absolute quality of a single protein model using structural features and support vector machines , 2009, Proteins.

[24]  A. Tramontano,et al.  Critical assessment of methods of protein structure prediction (CASP)—round IX , 2011, Proteins.

[25]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[26]  A. Sali,et al.  Protein Structure Prediction and Structural Genomics , 2001, Science.

[27]  Jianlin Cheng,et al.  APOLLO: a quality assessment service for single and multiple protein models , 2011, Bioinform..

[28]  C Kooperberg,et al.  Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. , 1997, Journal of molecular biology.

[29]  M. DePristo,et al.  Ab initio construction of polypeptide fragments: Efficient generation of accurate, representative ensembles , 2003, Proteins.

[30]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[31]  A. Sali,et al.  Modeller: generation and refinement of homology-based protein structure models. , 2003, Methods in enzymology.

[32]  Liam J. McGuffin Prediction of global and local model quality in CASP8 using the ModFOLD server , 2009, Proteins.

[33]  Pu Liu,et al.  A Self-Organizing Algorithm for Modeling Protein Loops , 2009, PLoS Comput. Biol..

[34]  C. Brooks,et al.  First-principles calculation of the folding free energy of a three-helix bundle protein. , 1995, Science.

[35]  Jianlin Cheng,et al.  Prediction of global and local quality of CASP8 models by MULTICOM series , 2009, Proteins.

[36]  Liam J McGuffin,et al.  Assembling novel protein folds from super‐secondary structural fragments , 2003, Proteins.

[37]  J. Skolnick,et al.  Automated structure prediction of weakly homologous proteins on a genomic scale. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[38]  Chaok Seok,et al.  A kinematic view of loop closure , 2004, J. Comput. Chem..

[39]  Yang Zhang,et al.  I-TASSER: a unified platform for automated protein structure and function prediction , 2010, Nature Protocols.

[40]  Jens Meiler,et al.  ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. , 2011, Methods in enzymology.

[41]  Yaoqi Zhou,et al.  Ab initio folding of terminal segments with secondary structures reveals the fine difference between two closely related all‐atom statistical energy functions , 2008, Protein science : a publication of the Protein Society.

[42]  M. Madera,et al.  A comparison of profile hidden Markov model procedures for remote homology detection. , 2002, Nucleic acids research.

[43]  Chaok Seok,et al.  Protein loop modeling by using fragment assembly and analytical loop closure , 2010, Proteins.