Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13

Prediction of residue-residue distance relationships (e.g. contacts) has become the key direction to advance protein tertiary structure prediction since 2014 CASP11 experiment, while deep learning has revolutionized the technology for contact and distance distribution prediction since its debut in 2012 CASP10 experiment. During 2018 CASP13 experiment, we enhanced our MULTICOM protein structure prediction system with three major components: contact distance prediction based on deep convolutional neural networks, contact distance-driven template-free (ab initio) modeling, and protein model ranking empowered by deep learning and contact prediction, in addition to an update of other components such as template library, sequence database, and alignment tools. Our experiment demonstrates that contact distance prediction and deep learning methods are the key reasons that MULTICOM was ranked 3rd out of all 98 predictors in both template-free and template-based protein structure modeling in CASP13. Deep convolutional neural network can utilize global information in pairwise residue-residue features such as co-evolution scores to substantially improve inter-residue contact distance prediction, which played a decisive role in correctly folding some free modeling and hard template-based modeling targets from scratch. Deep learning also successfully integrated 1D structural features, 2D contact information, and 3D structural quality scores to improve protein model quality assessment, where the contact prediction was demonstrated to consistently enhance ranking of protein models for the first time. The success of MULTICOM system in the CASP13 experiment clearly shows that protein contact distance prediction and model selection driven by powerful deep learning holds the key of solving protein structure prediction problem. However, there are still major challenges in accurately predicting protein contact distance when there are few homologous sequences to generate co-evolutionary signals, folding proteins from noisy contact distances, and ranking models of hard targets.

[1]  Renzhi Cao,et al.  UniCon3D: de novo protein structure prediction using united-residue conformational search via stepwise, probabilistic sampling , 2016, Bioinform..

[2]  Yang Zhang,et al.  A Novel Side-Chain Orientation Dependent Potential Derived from Random-Walk Reference State for Protein Fold Selection and Structure Prediction , 2010, PloS one.

[3]  Jianlin Cheng,et al.  Predicting protein residue-residue contacts using deep networks and boosting , 2012, Bioinform..

[4]  Georgios A. Pavlopoulos,et al.  Protein structure determination using metagenome sequence data , 2017, Science.

[5]  Lisa N Kinch,et al.  Evaluation of free modeling targets in CASP11 and ROLL , 2016, Proteins.

[6]  András Fiser,et al.  Effects of amino acid composition, finite size of proteins, and sparse statistics on distance‐dependent statistical pair potentials , 2007, Proteins.

[7]  Jianzhu Ma,et al.  RaptorX server: a resource for template-based protein structure modeling. , 2014, Methods in molecular biology.

[8]  Roland L. Dunbrack,et al.  Prediction of protein side-chain rotamers from a backbone-dependent rotamer library: a new homology modeling tool. , 1997, Journal of molecular biology.

[9]  Jens Meiler,et al.  ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. , 2011, Methods in enzymology.

[10]  Kliment Olechnovic,et al.  Voronota: A fast and reliable tool for computing the vertices of the Voronoi diagram of atomic balls , 2014, J. Comput. Chem..

[11]  Andriy Kryshtafovych,et al.  Assessment of contact predictions in CASP12: Co‐evolution and deep learning coming of age , 2017, Proteins.

[12]  Debswapna Bhattacharya,et al.  De novo protein conformational sampling using a probabilistic graphical model , 2015, Scientific Reports.

[13]  Robert D. Finn,et al.  HMMER web server: interactive sequence similarity searching , 2011, Nucleic Acids Res..

[14]  Jianlin Cheng,et al.  APOLLO: a quality assessment service for single and multiple protein models , 2011, Bioinform..

[15]  Yang Zhang,et al.  Scoring function for automated assessment of protein structure template quality , 2004, Proteins.

[16]  Martin Madera,et al.  Profile Comparer: a program for scoring and aligning profile hidden Markov models , 2008, Bioinform..

[17]  Björn Wallner,et al.  Improved model quality assessment using ProQ2 , 2012, BMC Bioinformatics.

[18]  Markus Gruber,et al.  CCMpred—fast and precise prediction of protein residue–residue contacts from correlated mutations , 2014, Bioinform..

[19]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[20]  Jianlin Cheng,et al.  MULTICOM: a multi-level combination approach to protein structure prediction and its assessments in CASP8 , 2010, Bioinform..

[21]  Xin Deng,et al.  PreDisorder: ab initio sequence-based prediction of protein disordered regions , 2009, BMC Bioinformatics.

[22]  A. Biegert,et al.  Sequence context-specific profiles for homology searching , 2009, Proceedings of the National Academy of Sciences.

[23]  Sergei Grudinin,et al.  Smooth orientation-dependent scoring function for coarse-grained protein quality assessment , 2018, Bioinform..

[24]  Badri Adhikari,et al.  Improved protein structure reconstruction using secondary structures, contacts at higher distance thresholds, and non-contacts , 2017, BMC Bioinformatics.

[25]  A. Tramontano,et al.  Critical assessment of methods of protein structure prediction: Progress and new directions in round XI , 2016, Proteins.

[26]  Jie Hou,et al.  DeepSF: deep convolutional neural network for mapping protein sequences to folds , 2017, Bioinform..

[27]  J Lundström,et al.  Pcons: A neural‐network–based consensus predictor that improves fold recognition , 2001, Protein science : a publication of the Protein Society.

[28]  Thomas A. Hopf,et al.  Protein 3D Structure Computed from Evolutionary Sequence Variation , 2011, PloS one.

[29]  A. Brunger Version 1.2 of the Crystallography and NMR system , 2007, Nature Protocols.

[30]  Burkhard Rost,et al.  FreeContact: fast and free software for protein contact prediction from residue co-evolution , 2014, BMC Bioinformatics.

[31]  Jianpeng Ma,et al.  OPUS-PSP: an orientation-dependent statistical all-atom potential derived from side-chain packing. , 2008, Journal of molecular biology.

[32]  A. Sali,et al.  Statistical potential for assessment and prediction of protein structures , 2006, Protein science : a publication of the Protein Society.

[33]  Renzhi Cao,et al.  Integrated protein function prediction by mining function associations, sequences, and protein-protein and gene-gene interaction networks. , 2016, Methods.

[34]  Jilong Li,et al.  Designing and benchmarking the MULTICOM protein structure prediction system , 2013, BMC Structural Biology.

[35]  Massimiliano Pontil,et al.  PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments , 2012, Bioinform..

[36]  Jilong Li,et al.  Large-scale model quality assessment for improving protein tertiary structure prediction , 2015, Bioinform..

[37]  Jinbo Xu,et al.  Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model , 2016 .

[38]  Jianlin Cheng,et al.  3D Genome Structure Modeling by Lorentzian Objective Function , 2017, BCB.

[39]  Anders Krogh,et al.  SAM: SEQUENCE ALIGNMENT AND MODELING SOFTWARE SYSTEM , 1995 .

[40]  Jilong Li,et al.  Massive integration of diverse protein quality assessment methods to improve template based modeling in CASP11 , 2016, Proteins.

[41]  Magnus Ekeberg,et al.  Fast pseudolikelihood maximization for direct-coupling analysis of protein structure from many homologous amino-acid sequences , 2014, J. Comput. Phys..

[42]  David T. Jones,et al.  High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features , 2018, Bioinform..

[43]  Johannes Söding,et al.  Protein homology detection by HMM?CHMM comparison , 2005, Bioinform..

[44]  Andriy Kryshtafovych,et al.  Assessment of hard target modeling in CASP12 reveals an emerging role of alignment‐based contact prediction methods , 2018, Proteins.

[45]  Narayanan Eswar,et al.  Protein structure modeling with MODELLER. , 2008, Methods in molecular biology.

[46]  C. Sander,et al.  Direct-coupling analysis of residue coevolution captures native contacts across many protein families , 2011, Proceedings of the National Academy of Sciences.

[47]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[48]  Renzhi Cao,et al.  3Drefine: an interactive web server for efficient protein structure refinement , 2016, Nucleic Acids Res..

[49]  Mirco Michel,et al.  PconsC4: fast, accurate and hassle-free contact predictions , 2019, Bioinform..

[50]  Dong Xu,et al.  FFAS-3D: improving fold recognition by including optimized structural features and template re-ranking , 2014, Bioinform..

[51]  Jianlin Cheng,et al.  Large-scale reconstruction of 3D structures of human chromosomes from chromosomal contact data , 2014, Nucleic acids research.

[52]  Jie Hou,et al.  DNCON2: improved protein contact prediction using two-level deep convolutional neural networks , 2017, bioRxiv.

[53]  Kuldip K. Paliwal,et al.  Capturing non‐local interactions by long short‐term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility , 2017, Bioinform..

[54]  Badri Adhikari,et al.  CONFOLD2: improved contact-driven ab initio protein structure modeling , 2018, BMC Bioinformatics.

[55]  Zhen Li,et al.  Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model , 2016, bioRxiv.

[56]  Miao Sun,et al.  QAcon: single model quality assessment using protein structural and contact information with machine learning techniques , 2016, Bioinform..

[57]  A. Tramontano,et al.  New encouraging developments in contact prediction: Assessment of the CASP11 results , 2016, Proteins.

[58]  Marco Biasini,et al.  lDDT: a local superposition-free score for comparing protein structures and models using distance difference tests , 2013, Bioinform..

[59]  Pierre Baldi,et al.  SSpro/ACCpro 5: almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity , 2014, Bioinform..

[60]  Jianlin Cheng,et al.  CONFOLD: Residue‐residue contact‐guided ab initio protein folding , 2015, Proteins.

[61]  N. Grishin,et al.  COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance. , 2003, Journal of molecular biology.

[62]  Kuldip K. Paliwal,et al.  Accurate prediction of protein contact maps by coupling residual two-dimensional bidirectional long short-term memory with convolutional neural networks , 2018, Bioinform..

[63]  Arne Elofsson,et al.  ProQ3: Improved model quality assessments using Rosetta energy terms , 2016, Scientific Reports.

[64]  Liam J. McGuffin,et al.  Rapid model quality assessment for protein structure predictions using the comparison of multiple models without structural alignments , 2010, Bioinform..

[65]  A. Biegert,et al.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment , 2011, Nature Methods.

[66]  Jie Hou,et al.  ConEVA: a toolbox for comprehensive assessment of protein contacts , 2016, BMC Bioinformatics.