Integration of QUARK and I‐TASSER for Ab Initio Protein Structure Prediction in CASP11

We tested two pipelines developed for template‐free protein structure prediction in the CASP11 experiment. First, the QUARK pipeline constructs structure models by reassembling fragments of continuously distributed lengths excised from unrelated proteins. Five free‐modeling (FM) targets have the model successfully constructed by QUARK with a TM‐score above 0.4, including the first model of T0837‐D1, which has a TM‐score = 0.736 and RMSD = 2.9 Å to the native. Detailed analysis showed that the success is partly attributed to the high‐resolution contact map prediction derived from fragment‐based distance‐profiles, which are mainly located between regular secondary structure elements and loops/turns and help guide the orientation of secondary structure assembly. In the Zhang‐Server pipeline, weakly scoring threading templates are re‐ordered by the structural similarity to the ab initio folding models, which are then reassembled by I‐TASSER based structure assembly simulations; 60% more domains with length up to 204 residues, compared to the QUARK pipeline, were successfully modeled by the I‐TASSER pipeline with a TM‐score above 0.4. The robustness of the I‐TASSER pipeline can stem from the composite fragment‐assembly simulations that combine structures from both ab initio folding and threading template refinements. Despite the promising cases, challenges still exist in long‐range beta‐strand folding, domain parsing, and the uncertainty of secondary structure prediction; the latter of which was found to affect nearly all aspects of FM structure predictions, from fragment identification, target classification, structure assembly, to final model selection. Significant efforts are needed to solve these problems before real progress on FM could be made. Proteins 2016; 84(Suppl 1):76–86. © 2015 Wiley Periodicals, Inc.

[1]  A. Szilágyi,et al.  Improving protein structure prediction using multiple sequence-based contact predictions. , 2011, Structure.

[2]  Thomas A. Hopf,et al.  Protein 3D Structure Computed from Evolutionary Sequence Variation , 2011, PloS one.

[3]  M. Karplus,et al.  CHARMM: A program for macromolecular energy, minimization, and dynamics calculations , 1983 .

[4]  Yang Zhang,et al.  SPICKER: A clustering approach to identify near‐native protein folds , 2004, J. Comput. Chem..

[5]  Yang Zhang,et al.  The I-TASSER Suite: protein structure and function prediction , 2014, Nature Methods.

[6]  Junmei Wang,et al.  Development and testing of a general amber force field , 2004, J. Comput. Chem..

[7]  Yang Zhang,et al.  Ab initio protein structure assembly using continuous structure fragments and optimized knowledge‐based force field , 2012, Proteins.

[8]  B. Alder,et al.  Phase Transition for a Hard Sphere System , 1957 .

[9]  J. Skolnick,et al.  GOAP: a generalized orientation-dependent, all-atom statistical potential for protein structure prediction. , 2011, Biophysical journal.

[10]  D. Eisenberg,et al.  An evolutionary approach to folding small alpha-helical proteins that uses sequence information and an empirical guiding fitness function. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Dong Xu,et al.  Toward optimal fragment generations for ab initio protein structure assembly , 2013, Proteins.

[12]  J. Skolnick,et al.  Local energy landscape flattening: Parallel hyperbolic Monte Carlo sampling of protein folding , 2002, Proteins.

[13]  Dong Xu,et al.  ThreaDom: extracting protein domain boundary information from multiple threading alignments , 2013, Bioinform..

[14]  Yang Zhang,et al.  I‐TASSER: Fully automated protein structure prediction in CASP8 , 2009, Proteins.

[15]  Yang Zhang,et al.  How significant is a protein structure similarity with TM-score = 0.5? , 2010, Bioinform..

[16]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[17]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[18]  D. Baker,et al.  Assessing the utility of coevolution-based residue–residue contact predictions in a sequence- and structure-rich era , 2013, Proceedings of the National Academy of Sciences.

[19]  Yang Zhang,et al.  A Novel Side-Chain Orientation Dependent Potential Derived from Random-Walk Reference State for Protein Fold Selection and Structure Prediction , 2010, PloS one.

[20]  Yang Zhang Interplay of I‐TASSER and QUARK for template‐based and ab initio protein structure prediction in CASP10 , 2014, Proteins.

[21]  Z. Ou-Yang,et al.  Stretching single-stranded DNA: interplay of electrostatic, base-pairing, and base-pair stacking interactions. , 2001, Biophysical journal.

[22]  Hongjun Bai,et al.  Assessment of template‐free modeling in CASP10 and ROLL , 2014, Proteins.

[23]  Sitao Wu,et al.  MUSTER: Improving protein sequence profile–profile alignments by using multiple sources of structure information , 2008, Proteins.

[24]  Yang Zhang,et al.  Atomic-level protein structure refinement using fragment-guided molecular dynamics conformation sampling. , 2011, Structure.

[25]  Yang Zhang,et al.  Scoring function for automated assessment of protein structure template quality , 2004, Proteins.

[26]  Sitao Wu,et al.  LOMETS: A local meta-threading-server for protein structure prediction , 2007, Nucleic acids research.

[27]  Hong-Bin Shen,et al.  Improving accuracy of protein contact prediction using balanced network deconvolution , 2015, Proteins.

[28]  J. Skolnick,et al.  TOUCHSTONE II: a new approach to ab initio protein structure prediction. , 2003, Biophysical journal.

[29]  Massimiliano Pontil,et al.  PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments , 2012, Bioinform..

[30]  Yang Zhang,et al.  I-TASSER server for protein 3D structure prediction , 2008, BMC Bioinformatics.

[31]  P. Bradley,et al.  Toward High-Resolution de Novo Structure Prediction for Small Proteins , 2005, Science.

[32]  A. Sali,et al.  Statistical potential for assessment and prediction of protein structures , 2006, Protein science : a publication of the Protein Society.

[33]  Yang Zhang,et al.  A comprehensive assessment of sequence-based and template-based methods for protein contact prediction , 2008, Bioinform..

[34]  Yang Zhang Progress and challenges in protein structure prediction. , 2008, Current opinion in structural biology.

[35]  C Kooperberg,et al.  Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. , 1997, Journal of molecular biology.