Template‐based and free modeling of I‐TASSER and QUARK pipelines using predicted contact maps in CASP12

We develop two complementary pipelines, “Zhang‐Server” and “QUARK”, based on I‐TASSER and QUARK pipelines for template‐based modeling (TBM) and free modeling (FM), and test them in the CASP12 experiment. The combination of I‐TASSER and QUARK successfully folds three medium‐size FM targets that have more than 150 residues, even though the interplay between the two pipelines still awaits further optimization. Newly developed sequence‐based contact prediction by NeBcon plays a critical role to enhance the quality of models, particularly for FM targets, by the new pipelines. The inclusion of NeBcon predicted contacts as restraints in the QUARK simulations results in an average TM‐score of 0.41 for the best in top five predicted models, which is 37% higher than that by the QUARK simulations without contacts. In particular, there are seven targets that are converted from non‐foldable to foldable (TM‐score >0.5) due to the use of contact restraints in the simulations. Another additional feature in the current pipelines is the local structure quality prediction by ResQ, which provides a robust residue‐level modeling error estimation. Despite the success, significant challenges still remain in ab initio modeling of multi‐domain proteins and folding of β‐proteins with complicated topologies bound by long‐range strand‐strand interactions. Improvements on domain boundary and long‐range contact prediction, as well as optimal use of the predicted contacts and multiple threading alignments, are critical to address these issues seen in the CASP12 experiment.

[1]  Markus Gruber,et al.  CCMpred—fast and precise prediction of protein residue–residue contacts from correlated mutations , 2014, Bioinform..

[2]  Dong Xu,et al.  Toward optimal fragment generations for ab initio protein structure assembly , 2013, Proteins.

[3]  Yang Zhang,et al.  Ab initio protein structure assembly using continuous structure fragments and optimized knowledge‐based force field , 2012, Proteins.

[4]  David T. Jones,et al.  De Novo Structure Prediction of Globular Proteins Aided by Sequence Variation-Derived Contacts , 2014, PloS one.

[5]  Torsten Schwede,et al.  Assessment of template based protein structure predictions in CASP9 , 2011, Proteins.

[6]  J. Skolnick,et al.  TOUCHSTONE II: a new approach to ab initio protein structure prediction. , 2003, Biophysical journal.

[7]  Yang Zhang,et al.  SPICKER: A clustering approach to identify near‐native protein folds , 2004, J. Comput. Chem..

[8]  Yang Zhang,et al.  The I-TASSER Suite: protein structure and function prediction , 2014, Nature Methods.

[9]  Burkhard Rost,et al.  FreeContact: fast and free software for protein contact prediction from residue co-evolution , 2014, BMC Bioinformatics.

[10]  Yan Wang,et al.  ResQ: An Approach to Unified Estimation of B-Factor and Residue-Specific Error in Protein Structure Prediction. , 2016, Journal of molecular biology.

[11]  Yang Zhang Progress and challenges in protein structure prediction. , 2008, Current opinion in structural biology.

[12]  D. Baker,et al.  Assessing the utility of coevolution-based residue–residue contact predictions in a sequence- and structure-rich era , 2013, Proceedings of the National Academy of Sciences.

[13]  James M Aramini,et al.  Assessment of template‐based protein structure predictions in CASP10 , 2014, Proteins.

[14]  David T. Jones,et al.  MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins , 2014, Bioinform..

[15]  Yuxing Liao,et al.  CASP9 assessment of free modeling target predictions , 2011, Proteins.

[16]  Hongjun Bai,et al.  Assessment of template‐free modeling in CASP10 and ROLL , 2014, Proteins.

[17]  Yang Zhang,et al.  I-TASSER server for protein 3D structure prediction , 2008, BMC Bioinformatics.

[18]  Yang Zhang,et al.  Improving the physical realism and structural accuracy of protein models by a two-step atomic-level energy minimization. , 2011, Biophysical journal.

[19]  Yang Zhang,et al.  A comprehensive assessment of sequence-based and template-based methods for protein contact prediction , 2008, Bioinform..

[20]  J. Skolnick,et al.  GOAP: a generalized orientation-dependent, all-atom statistical potential for protein structure prediction. , 2011, Biophysical journal.

[21]  Lisa N Kinch,et al.  Evaluation of free modeling targets in CASP11 and ROLL , 2016, Proteins.

[22]  Yang Zhang Interplay of I‐TASSER and QUARK for template‐based and ab initio protein structure prediction in CASP10 , 2014, Proteins.

[23]  Yang Zhang,et al.  A Novel Side-Chain Orientation Dependent Potential Derived from Random-Walk Reference State for Protein Fold Selection and Structure Prediction , 2010, PloS one.

[24]  Dong Xu,et al.  ThreaDom: extracting protein domain boundary information from multiple threading alignments , 2013, Bioinform..

[25]  Hong-Bin Shen,et al.  Integration of QUARK and I‐TASSER for Ab Initio Protein Structure Prediction in CASP11 , 2016, Proteins.

[26]  Nancy Wilkins-Diehr,et al.  XSEDE: Accelerating Scientific Discovery , 2014, Computing in Science & Engineering.

[27]  Pierre Baldi,et al.  Three-stage prediction of protein ?-sheets by neural networks, alignments and graph algorithms , 2005, ISMB.

[28]  David E. Kim,et al.  Improved de novo structure prediction in CASP11 by incorporating coevolution information into Rosetta , 2016, Proteins.

[29]  Pierre Baldi,et al.  SSpro/ACCpro 5: almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity , 2014, Bioinform..

[30]  Pierre Baldi,et al.  Improved residue contact prediction using support vector machines and a large feature set , 2007, BMC Bioinformatics.

[31]  Yang Zhang,et al.  A comparative assessment and analysis of 20 representative sequence alignment methods for protein structure prediction , 2013, Scientific Reports.

[32]  Sitao Wu,et al.  LOMETS: A local meta-threading-server for protein structure prediction , 2007, Nucleic acids research.

[33]  Anna Tramontano,et al.  Methods of model accuracy estimation can help selecting the best models from decoy sets: Assessment of model accuracy estimations in CASP11 , 2016, Proteins.

[34]  Yang Zhang,et al.  I‐TASSER: Fully automated protein structure prediction in CASP8 , 2009, Proteins.

[35]  T. Hwa,et al.  Identification of direct residue contacts in protein–protein interaction by message passing , 2009, Proceedings of the National Academy of Sciences.

[36]  Yang Zhang,et al.  I-TASSER: a unified platform for automated protein structure and function prediction , 2010, Nature Protocols.

[37]  Yang Zhang,et al.  Atomic-level protein structure refinement using fragment-guided molecular dynamics conformation sampling. , 2011, Structure.

[38]  J. Skolnick,et al.  Ab initio modeling of small proteins by iterative TASSER simulations , 2007, BMC Biology.

[39]  Yang Zhang,et al.  NeBcon: protein contact map prediction using neural network training coupled with naïve Bayes classifiers , 2017, Bioinform..

[40]  A. Szilágyi,et al.  Improving protein structure prediction using multiple sequence-based contact predictions. , 2011, Structure.

[41]  Yang Zhang,et al.  REMO: A new protocol to refine full atomic protein models from C‐alpha traces by optimizing hydrogen‐bonding networks , 2009, Proteins.

[42]  A. Tramontano,et al.  New encouraging developments in contact prediction: Assessment of the CASP11 results , 2016, Proteins.

[43]  Roland L Dunbrack,et al.  Assessment of template‐based modeling of protein structure in CASP11 , 2016, Proteins.

[44]  Massimiliano Pontil,et al.  PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments , 2012, Bioinform..

[45]  Yang Zhang Protein structure prediction: when is it useful? , 2009, Current opinion in structural biology.

[46]  Hong-Bin Shen,et al.  Template‐based protein structure prediction in CASP11 and retrospect of I‐TASSER in the last decade , 2016, Proteins.

[47]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[48]  Jian Peng,et al.  Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields , 2015, Scientific Reports.

[49]  J. Skolnick,et al.  Local energy landscape flattening: Parallel hyperbolic Monte Carlo sampling of protein folding , 2002, Proteins.

[50]  A. Sali,et al.  Statistical potential for assessment and prediction of protein structures , 2006, Protein science : a publication of the Protein Society.

[51]  BaldiPierre,et al.  Three-stage prediction of protein β-sheets by neural networks, alignments and graph algorithms , 2005 .

[52]  Yang Zhang,et al.  Scoring function for automated assessment of protein structure template quality , 2004, Proteins.