CONFOLD2: improved contact-driven ab initio protein structure modeling

BackgroundContact-guided protein structure prediction methods are becoming more and more successful because of the latest advances in residue-residue contact prediction. To support contact-driven structure prediction, effective tools that can quickly build tertiary structural models of good quality from predicted contacts need to be developed.ResultsWe develop an improved contact-driven protein modelling method, CONFOLD2, and study how it may be effectively used for ab initio protein structure prediction with predicted contacts as input. It builds models using various subsets of input contacts to explore the fold space under the guidance of a soft square energy function, and then clusters the models to obtain the top five models. CONFOLD2 obtains an average reconstruction accuracy of 0.57 TM-score for the 150 proteins in the PSICOV contact prediction dataset. When benchmarked on the CASP11 contacts predicted using CONSIP2 and CASP12 contacts predicted using Raptor-X, CONFOLD2 achieves a mean TM-score of 0.41 on both datasets.ConclusionCONFOLD2 allows to quickly generate top five structural models for a protein sequence when its secondary structures and contacts predictions at hand. The source code of CONFOLD2 is publicly available at https://github.com/multicom-toolbox/CONFOLD2/.

[1]  Robert Smith,et al.  JAMSS: proteomics mass spectrometry simulation in Java , 2015, Bioinform..

[2]  David T. Jones,et al.  Accurate contact predictions using covariation techniques and machine learning , 2015, Proteins.

[3]  Massimiliano Pontil,et al.  PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments , 2012, Bioinform..

[4]  Pierre Baldi,et al.  SCRATCH: a protein structure and structural feature prediction server , 2005, Nucleic Acids Res..

[5]  Sheng Wang,et al.  Analysis of deep learning methods for blind protein contact prediction in CASP12 , 2017, bioRxiv.

[6]  Oliver Brock,et al.  Analysis of free modeling predictions by RBO aleph in CASP11 , 2016, Proteins.

[7]  S. Hodge,et al.  Zinc and Zinc Transporters in Macrophages and Their Roles in Efferocytosis in COPD , 2014, PloS one.

[8]  A. Gronenborn,et al.  Determination of three-dimensional structures of proteins by simulated annealing with interproton distance restraints. Application to crambin, potato carboxypeptidase inhibitor and barley serine proteinase inhibitor 2. , 1988, Protein engineering.

[9]  Arne Elofsson,et al.  MaxSub: an automated measure for the assessment of protein structure prediction quality , 2000, Bioinform..

[10]  Mirco Michel,et al.  Large-scale structure prediction by improved contact predictions and model quality assessment , 2017 .

[11]  Jie Hou,et al.  ConEVA: a toolbox for comprehensive assessment of protein contacts , 2016, BMC Bioinformatics.

[12]  David T. Jones,et al.  MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins , 2014, Bioinform..

[13]  Marcin J. Skwark,et al.  PconsFold: improved contact predictions improve protein models , 2014, Bioinform..

[14]  Jinbo Xu,et al.  Analysis of deep learning methods for blind protein contact prediction in CASP12 , 2018, Proteins.

[15]  A. Tramontano,et al.  New encouraging developments in contact prediction: Assessment of the CASP11 results , 2016, Proteins.

[16]  David T. Jones,et al.  De Novo Structure Prediction of Globular Proteins Aided by Sequence Variation-Derived Contacts , 2014, PloS one.

[17]  Jianlin Cheng,et al.  CONFOLD: Residue‐residue contact‐guided ab initio protein folding , 2015, Proteins.