Assessment of contact predictions in CASP12: Co‐evolution and deep learning coming of age

Following up on the encouraging results of residue‐residue contact prediction in the CASP11 experiment, we present the analysis of predictions submitted for CASP12. The submissions include predictions of 34 groups for 38 domains classified as free modeling targets which are not accessible to homology‐based modeling due to a lack of structural templates. CASP11 saw a rise of coevolution‐based methods outperforming other approaches. The improvement of these methods coupled to machine learning and sequence database growth are most likely the main driver for a significant improvement in average precision from 27% in CASP11 to 47% in CASP12. In more than half of the targets, especially those with many homologous sequences accessible, precisions above 90% were achieved with the best predictors reaching a precision of 100% in some cases. We furthermore tested the impact of using these contacts as restraints in ab initio modeling of 14 single‐domain free modeling targets using Rosetta. Adding contacts to the Rosetta calculations resulted in improvements of up to 26% in GDT_TS within the top five structures.

[1]  Liam J. McGuffin,et al.  The PSIPRED protein structure prediction server , 2000, Bioinform..

[2]  Marcin J. Skwark,et al.  PconsFold: improved contact predictions improve protein models , 2014, Bioinform..

[3]  O. Brock,et al.  Combining Physicochemical and Evolutionary Information for Protein Contact Prediction , 2014, PloS one.

[4]  Christodoulos A. Floudas,et al.  conSSert: Consensus SVM Model for Accurate Prediction of Ordered Secondary Structure , 2016, J. Chem. Inf. Model..

[5]  Yaoqi Zhou,et al.  Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates , 2011, Bioinform..

[6]  S Brunak,et al.  Protein structures from distance inequalities. , 1993, Journal of molecular biology.

[7]  Pierre Baldi,et al.  Three-stage prediction of protein ?-sheets by neural networks, alignments and graph algorithms , 2005, ISMB.

[8]  T. Hwa,et al.  Identification of direct residue contacts in protein–protein interaction by message passing , 2009, Proceedings of the National Academy of Sciences.

[9]  Thomas A. Hopf,et al.  Protein structure prediction from sequence variation , 2012, Nature Biotechnology.

[10]  Gordon M. Crippen,et al.  Note rapid calculation of coordinates from distance matrices , 1978 .

[11]  J. Skolnick,et al.  MONSSTER: a method for folding globular proteins with a small number of distance restraints. , 1997, Journal of molecular biology.

[12]  David T. Jones,et al.  MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins , 2014, Bioinform..

[13]  Thomas A. Hopf,et al.  Protein 3D Structure Computed from Evolutionary Sequence Variation , 2011, PloS one.

[14]  Marcin J. Skwark,et al.  Improving Contact Prediction along Three Dimensions , 2014, PLoS Comput. Biol..

[15]  David E. Kim,et al.  Physically realistic homology models built with ROSETTA can be more accurate than their templates. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[16]  SödingJohannes Protein homology detection by HMM--HMM comparison , 2005 .

[17]  Patrice Koehl Faculty Opinions recommendation of Assessment of contact predictions in CASP12: Co-evolution and deep learning coming of age. , 2018, Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature.

[18]  C. Sander,et al.  Direct-coupling analysis of residue coevolution captures native contacts across many protein families , 2011, Proceedings of the National Academy of Sciences.

[19]  Marcin J. Skwark,et al.  PconsC: combination of direct information methods and alignments improves contact prediction , 2013, Bioinform..

[20]  David Baker,et al.  Incorporation of evolutionary information into Rosetta comparative modeling , 2011, Proteins.

[21]  Zhen Li,et al.  Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model , 2016, bioRxiv.

[22]  Burkhard Rost,et al.  FreeContact: fast and free software for protein contact prediction from residue co-evolution , 2014, BMC Bioinformatics.

[23]  Jens Meiler,et al.  CASP6 assessment of contact prediction , 2005, Proteins.

[24]  C. Orengo,et al.  Analysis and assessment of ab initio three‐dimensional prediction, secondary structure, and contacts prediction , 1999, Proteins.

[25]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[26]  Osvaldo Olmea,et al.  MAMMOTH (Matching molecular models obtained from theory): An automated method for model comparison , 2002, Protein science : a publication of the Protein Society.

[27]  Marcin J. Skwark,et al.  Improved Contact Predictions Using the Recognition of Protein Like Contact Patterns , 2014, PLoS Comput. Biol..

[28]  Robert D. Finn,et al.  HMMER web server: 2015 update , 2015, Nucleic Acids Res..

[29]  Jing Yang,et al.  R2C: improving ab initio residue contact map prediction using dynamic fusion strategy and Gaussian noise filter , 2016, Bioinform..

[30]  Sean R. Eddy,et al.  Hidden Markov model speed heuristic and iterative HMM search procedure , 2010, BMC Bioinformatics.

[31]  A. Lesk,et al.  Assessment of novel fold targets in CASP4: Predictions of three‐dimensional structures, secondary structures, and interresidue contacts , 2001, Proteins.

[32]  Magnus Ekeberg,et al.  Fast pseudolikelihood maximization for direct-coupling analysis of protein structure from many homologous amino-acid sequences , 2014, J. Comput. Phys..

[33]  Osvaldo Graña,et al.  Assessment of domain boundary predictions and the prediction of intramolecular contacts in CASP8 , 2009, Proteins.

[34]  Anna Tramontano,et al.  Evaluation of residue–residue contact prediction in CASP10 , 2014, Proteins.

[35]  Hong-Bin Shen,et al.  Improving accuracy of protein contact prediction using balanced network deconvolution , 2015, Proteins.

[36]  A. Biegert,et al.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment , 2011, Nature Methods.

[37]  Yang Zhang,et al.  The I-TASSER Suite: protein structure and function prediction , 2014, Nature Methods.

[38]  Jianlin Cheng,et al.  Predicting protein residue-residue contacts using deep networks and boosting , 2012, Bioinform..

[39]  Yang Zhang,et al.  A comprehensive assessment of sequence-based and template-based methods for protein contact prediction , 2008, Bioinform..

[40]  Johannes Söding,et al.  Protein homology detection by HMM?CHMM comparison , 2005, Bioinform..

[41]  C. Sander,et al.  Correlated mutations and residue contacts in proteins , 1994, Proteins.

[42]  Markus Gruber,et al.  CCMpred—fast and precise prediction of protein residue–residue contacts from correlated mutations , 2014, Bioinform..

[43]  Pierre Baldi,et al.  SCRATCH: a protein structure and structural feature prediction server , 2005, Nucleic Acids Res..

[44]  M. Levandowsky,et al.  Distance between Sets , 1971, Nature.

[45]  Thomas A. Hopf,et al.  Sequence co-evolution gives 3D contacts and structures of protein complexes , 2014, eLife.

[46]  Susan Hassler,et al.  Not Science, But Necessary , 1994, Bio/Technology.

[47]  A. Tramontano,et al.  Evaluation of residue–residue contact predictions in CASP9 , 2011, Proteins.

[48]  W. Delano The PyMOL Molecular Graphics System , 2002 .

[49]  D. Baker,et al.  Assessing the utility of coevolution-based residue–residue contact predictions in a sequence- and structure-rich era , 2013, Proceedings of the National Academy of Sciences.

[50]  Chin-Hsien Tai,et al.  Assessment of CASP10 contact‐assisted predictions , 2014, Proteins.

[51]  Alfonso Valencia,et al.  Assessment of intramolecular contact predictions for CASP7 , 2007, Proteins.

[52]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[53]  Adam Zemla,et al.  LGA: a method for finding 3D similarities in protein structures , 2003, Nucleic Acids Res..

[54]  Pierre Baldi,et al.  SSpro/ACCpro 5: almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity , 2014, Bioinform..

[55]  Pierre Baldi,et al.  Improved residue contact prediction using support vector machines and a large feature set , 2007, BMC Bioinformatics.

[56]  Massimiliano Pontil,et al.  PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments , 2012, Bioinform..

[57]  Patrick Aloy,et al.  Predictions without templates: New folds, secondary structure, and contacts in CASP5 , 2003, Proteins.

[58]  Gert Vriend,et al.  Quantitative evaluation of experimental NMR restraints. , 2003, Journal of the American Chemical Society.

[59]  Piotr Sliz,et al.  Collaboration gets the most out of software , 2013, eLife.

[60]  Jinbo Xu,et al.  Analysis of deep learning methods for blind protein contact prediction in CASP12 , 2018, Proteins.

[61]  A. Tramontano,et al.  New encouraging developments in contact prediction: Assessment of the CASP11 results , 2016, Proteins.

[62]  Jens Meiler,et al.  ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. , 2011, Methods in enzymology.

[63]  Piero Fariselli,et al.  FT-COMAR: fault tolerant three-dimensional structure reconstruction from protein contact maps , 2008, Bioinform..

[64]  C. Sander,et al.  Can three-dimensional contacts in protein structures be predicted by analysis of correlated mutations? , 1994, Protein engineering.

[65]  F. Morcos,et al.  Genomics-aided structure prediction , 2012, Proceedings of the National Academy of Sciences.

[66]  K Nishikawa,et al.  A geometrical constraint approach for reproducing the native backbone conformation of a protein , 1993, Proteins.

[67]  David E. Kim,et al.  One contact for every twelve residues allows robust and accurate topology‐level protein structure modeling , 2014, Proteins.

[68]  Piyush Agrawal,et al.  Prediction of residue-residue contacts in CASP12 targets from its predicted tertiary structures , 2017, bioRxiv.

[69]  D. Baker,et al.  Robust and accurate prediction of residue–residue interactions across protein interfaces using evolutionary information , 2014, eLife.

[70]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[71]  David T Jones,et al.  Improved protein contact predictions with the MetaPSICOV2 server in CASP12 , 2018, Proteins.

[72]  Erik van Nimwegen,et al.  Disentangling Direct from Indirect Co-Evolution of Residues in Protein Alignments , 2010, PLoS Comput. Biol..

[73]  A M Lesk,et al.  CASP2: Report on ab initio predictions , 1997, Proteins.