Analysis of distance‐based protein structure prediction by deep learning in CASP13

This paper reports the CASP13 results of distance‐based contact prediction, threading, and folding methods implemented in three RaptorX servers, which are built upon the powerful deep convolutional residual neural network (ResNet) method initiated by us for contact prediction in CASP12. On the 32 CASP13 FM (free‐modeling) targets with a median multiple sequence alignment (MSA) depth of 36, RaptorX yielded the best contact prediction among 46 groups and almost the best 3D structure modeling among all server groups without time‐consuming conformation sampling. In particular, RaptorX achieved top L/5, L/2, and L long‐range contact precision of 70%, 58%, and 45%, respectively, and predicted correct folds (TMscore > 0.5) for 18 of 32 targets. Further, RaptorX predicted correct folds for all FM targets with >300 residues (T0950‐D1, T0969‐D1, and T1000‐D2) and generated the best 3D models for T0950‐D1 and T0969‐D1 among all groups. This CASP13 test confirms our previous findings: (a) predicted distance is more useful than contacts for both template‐based and free modeling; and (b) structure modeling may be improved by integrating template and coevolutionary information via deep learning. This paper will discuss progress we have made since CASP12, the strength and weakness of our methods, and why deep learning performed much better in CASP13.

[1]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[2]  Sivaraman Balakrishnan,et al.  Learning generative models for protein fold families , 2011, Proteins.

[3]  Sheng Wang,et al.  Protein threading using residue co-variation and deep learning , 2018, Bioinform..

[4]  Jianzhu Ma,et al.  Protein structure alignment beyond spatial proximity , 2013, Scientific Reports.

[5]  Feng Zhao,et al.  PredMP: a web server for de novo prediction and visualization of membrane proteins , 2018, Bioinform..

[6]  David Baker,et al.  High-resolution comparative modeling with RosettaCM. , 2013, Structure.

[7]  Bonnie Berger,et al.  Enhancing Evolutionary Couplings with Deep Convolutional Neural Networks , 2017, Cell systems.

[8]  Zhiyong Wang,et al.  MRFalign: Protein Homology Detection through Alignment of Markov Random Fields , 2014, PLoS Comput. Biol..

[9]  Sean R. Eddy,et al.  Hidden Markov model speed heuristic and iterative HMM search procedure , 2010, BMC Bioinformatics.

[10]  Jie Hou,et al.  DNCON2: improved protein contact prediction using two-level deep convolutional neural networks , 2017, bioRxiv.

[11]  Janusz M. Bujnicki,et al.  GDFuzz3D: a method for protein 3D structure reconstruction from contact maps, based on a non-Euclidean distance function , 2015, Bioinform..

[12]  G. Stormo,et al.  Correlated mutations in models of protein sequences: phylogenetic and structural effects , 1999 .

[13]  Jinbo Xu,et al.  Analysis of deep learning methods for blind protein contact prediction in CASP12 , 2018, Proteins.

[14]  David T. Jones,et al.  High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features , 2018, Bioinform..

[15]  Jianlin Cheng,et al.  Predicting protein residue-residue contacts using deep networks and boosting , 2012, Bioinform..

[16]  Jinbo Xu,et al.  A position-specific distance-dependent statistical potential for protein structure and functional study. , 2012, Structure.

[17]  Jian Peng,et al.  A conditional neural fields model for protein threading , 2012, Bioinform..

[18]  Carlo Baldassi,et al.  Fast and Accurate Multivariate Gaussian Modeling of Protein Families: Predicting Residue Contacts and Protein-Interaction Partners , 2014, PloS one.

[19]  David T. Jones,et al.  MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins , 2014, Bioinform..

[20]  Zhen Li,et al.  Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model , 2016, bioRxiv.

[21]  Yang Zhang,et al.  Template‐based and free modeling of I‐TASSER and QUARK pipelines using predicted contact maps in CASP12 , 2018, Proteins.

[22]  Jian Peng,et al.  Conditional Neural Fields , 2009, NIPS.

[23]  Markus Gruber,et al.  CCMpred—fast and precise prediction of protein residue–residue contacts from correlated mutations , 2014, Bioinform..

[24]  Thomas A. Hopf,et al.  Protein 3D Structure Computed from Evolutionary Sequence Variation , 2011, PloS one.

[25]  A. Brunger Version 1.2 of the Crystallography and NMR system , 2007, Nature Protocols.

[26]  Giuseppe Tradigo,et al.  Toward an accurate prediction of inter-residue distances in proteins using 2D recursive neural networks , 2014, BMC Bioinformatics.

[27]  Qing Wu,et al.  ComplexContact: a web server for inter-protein contact prediction using deep learning , 2018, Nucleic Acids Res..

[28]  T. Hwa,et al.  Identification of direct residue contacts in protein–protein interaction by message passing , 2009, Proceedings of the National Academy of Sciences.

[29]  Andrzej Kloczkowski,et al.  Distance matrix-based approach to protein structure prediction , 2009, Journal of Structural and Functional Genomics.

[30]  Alessandro Barbato,et al.  Continuous Automated Model EvaluatiOn (CAMEO) complementing the critical assessment of structure prediction in CASP12 , 2018, Proteins.

[31]  M H Saier,et al.  A new subfamily of bacterial ABC‐type transport systems catalyzing export of drugs and carbohydrates , 1992, Protein science : a publication of the Protein Society.

[32]  Jianlin Cheng,et al.  CONFOLD: Residue‐residue contact‐guided ab initio protein folding , 2015, Proteins.

[33]  Kuldip K. Paliwal,et al.  Accurate prediction of protein contact maps by coupling residual two-dimensional bidirectional long short-term memory with convolutional neural networks , 2018, Bioinform..

[34]  Erik van Nimwegen,et al.  Disentangling Direct from Indirect Co-Evolution of Residues in Protein Alignments , 2010, PLoS Comput. Biol..

[35]  Johannes Söding,et al.  The HHpred interactive server for protein homology detection and structure prediction , 2005, Nucleic Acids Res..

[36]  Wei Li,et al.  RaptorX-Property: a web server for protein structure property prediction , 2016, Nucleic Acids Res..

[37]  Jianlin Cheng,et al.  A study and benchmark of DNcon: a method for protein residue-residue contact prediction using deep networks , 2013, BMC Bioinformatics.

[38]  Badri Adhikari,et al.  CONFOLD: residue-residue contact-guided ab initio protein folding , 2015 .

[39]  A. Biegert,et al.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment , 2011, Nature Methods.

[40]  Frank DiMaio,et al.  Protein structure prediction using Rosetta in CASP12 , 2018, Proteins.

[41]  W. Taylor,et al.  Global fold determination from a small number of distance restraints. , 1995, Journal of molecular biology.

[42]  Yang Li,et al.  LOMETS2: improved meta-threading server for fold-recognition and structure-based function annotation for distant-homology proteins , 2019, Nucleic Acids Res..

[43]  Jinbo Xu,et al.  Analysis of distance-based protein structure prediction by deep learning in CASP13 , 2019, bioRxiv.