Improved protein structure reconstruction using secondary structures, contacts at higher distance thresholds, and non-contacts

BackgroundResidue-residue contacts are key features for accurate de novo protein structure prediction. For the optimal utilization of these predicted contacts in folding proteins accurately, it is important to study the challenges of reconstructing protein structures using true contacts. Because contact-guided protein modeling approach is valuable for predicting the folds of proteins that do not have structural templates, it is necessary for reconstruction studies to focus on hard-to-predict protein structures.ResultsUsing a data set consisting of 496 structural domains released in recent CASP experiments and a dataset of 150 representative protein structures, in this work, we discuss three techniques to improve the reconstruction accuracy using true contacts – adding secondary structures, increasing contact distance thresholds, and adding non-contacts. We find that reconstruction using secondary structures and contacts can deliver accuracy higher than using full contact maps. Similarly, we demonstrate that non-contacts can improve reconstruction accuracy not only when the used non-contacts are true but also when they are predicted. On the dataset consisting of 150 proteins, we find that by simply using low ranked predicted contacts as non-contacts and adding them as additional restraints, can increase the reconstruction accuracy by 5% when the reconstructed models are evaluated using TM-score.ConclusionsOur findings suggest that secondary structures are invaluable companions of contacts for accurate reconstruction. Confirming some earlier findings, we also find that larger distance thresholds are useful for folding many protein structures which cannot be folded using the standard definition of contacts. Our findings also suggest that for more accurate reconstruction using predicted contacts it is useful to predict contacts at higher distance thresholds (beyond 8 Å) and predict non-contacts.

[1]  A. Biegert,et al.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment , 2011, Nature Methods.

[2]  Piero Fariselli,et al.  FT-COMAR: fault tolerant three-dimensional structure reconstruction from protein contact maps , 2008, Bioinform..

[3]  Steven E. Brenner,et al.  SCOPe: Structural Classification of Proteins—extended, integrating SCOP and ASTRAL data and classification of new structures , 2013, Nucleic Acids Res..

[4]  Liam J. McGuffin,et al.  The PSIPRED protein structure prediction server , 2000, Bioinform..

[5]  Zhendong Bei,et al.  COMSAT: Residue contact prediction of transmembrane proteins based on support vector machines and mixed integer linear programming , 2016, Proteins.

[6]  Jianlin Cheng,et al.  NNcon: improved protein contact map prediction using 2D-recursive neural networks , 2009, Nucleic Acids Res..

[7]  Piero Fariselli,et al.  Reconstruction of 3D Structures From Protein Contact Maps , 2008, IEEE ACM Trans. Comput. Biol. Bioinform..

[8]  Piero Fariselli,et al.  Blurring contact maps of thousands of proteins: what we can learn by reconstructing 3D structure , 2011, BioData Mining.

[9]  Jianlin Cheng,et al.  CONFOLD: Residue‐residue contact‐guided ab initio protein folding , 2015, Proteins.

[10]  Sean R. Eddy,et al.  Hidden Markov model speed heuristic and iterative HMM search procedure , 2010, BMC Bioinformatics.

[11]  M Vendruscolo,et al.  Recovery of protein structure from contact maps. , 1997, Folding & design.

[12]  Michael Lappe,et al.  Defining an Essence of Structure Determining Residue Contacts in Proteins , 2009, PLoS Comput. Biol..

[13]  Massimiliano Pontil,et al.  PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments , 2012, Bioinform..

[14]  David T. Jones,et al.  Accurate contact predictions using covariation techniques and machine learning , 2015, Proteins.

[15]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[16]  David T. Jones,et al.  MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins , 2014, Bioinform..

[17]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[18]  Jilong Li,et al.  An Improved Integration of Template-Based and Template-Free Protein Structure Modeling Methods and its Assessment in CASP11. , 2015, Protein and peptide letters.

[19]  Pierre Baldi,et al.  SSpro/ACCpro 5: almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity , 2014, Bioinform..

[20]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[21]  Michael Lappe,et al.  Optimal contact definition for reconstruction of Contact Maps , 2010, BMC Bioinformatics.