DNCON2: improved protein contact prediction using two-level deep convolutional neural networks

Abstract Motivation Significant improvements in the prediction of protein residue–residue contacts are observed in the recent years. These contacts, predicted using a variety of coevolution-based and machine learning methods, are the key contributors to the recent progress in ab initio protein structure prediction, as demonstrated in the recent CASP experiments. Continuing the development of new methods to reliably predict contact maps is essential to further improve ab initio structure prediction. Results In this paper we discuss DNCON2, an improved protein contact map predictor based on two-level deep convolutional neural networks. It consists of six convolutional neural networks—the first five predict contacts at 6, 7.5, 8, 8.5 and 10 Å distance thresholds, and the last one uses these five predictions as additional features to predict final contact maps. On the free-modeling datasets in CASP10, 11 and 12 experiments, DNCON2 achieves mean precisions of 35, 50 and 53.4%, respectively, higher than 30.6% by MetaPSICOV on CASP10 dataset, 34% by MetaPSICOV on CASP11 dataset and 46.3% by Raptor-X on CASP12 dataset, when top L/5 long-range contacts are evaluated. We attribute the improved performance of DNCON2 to the inclusion of short- and medium-range contacts into training, two-level approach to prediction, use of the state-of-the-art optimization and activation functions, and a novel deep learning architecture that allows each filter in a convolutional layer to access all the input features of a protein of arbitrary length. Availability and implementation The web server of DNCON2 is at http://sysbio.rnet.missouri.edu/dncon2/ where training and testing datasets as well as the predictions for CASP10, 11 and 12 free-modeling datasets can also be downloaded. Its source code is available at https://github.com/multicom-toolbox/DNCON2/. Supplementary information Supplementary data are available at Bioinformatics online.

[1]  Marcin J. Skwark,et al.  Improved Contact Predictions Using the Recognition of Protein Like Contact Patterns , 2014, PLoS Comput. Biol..

[2]  Jianlin Cheng,et al.  CONFOLD: Residue‐residue contact‐guided ab initio protein folding , 2015, Proteins.

[3]  Anna Tramontano,et al.  Evaluation of residue–residue contact prediction in CASP10 , 2014, Proteins.

[4]  David C. Jones Predicting novel protein folds by using FRAGFOLD , 2001, Proteins.

[5]  David E. Kim,et al.  Improved de novo structure prediction in CASP11 by incorporating coevolution information into Rosetta , 2016, Proteins.

[6]  Marcin J. Skwark,et al.  PconsFold: improved contact predictions improve protein models , 2014, Bioinform..

[7]  A. Tramontano,et al.  Evaluation of residue–residue contact predictions in CASP9 , 2011, Proteins.

[8]  Hong-Bin Shen,et al.  Integration of QUARK and I‐TASSER for Ab Initio Protein Structure Prediction in CASP11 , 2016, Proteins.

[9]  Massimiliano Pontil,et al.  PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments , 2012, Bioinform..

[10]  Jinbo Xu,et al.  Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model , 2016 .

[11]  Sean R. Eddy,et al.  Hidden Markov model speed heuristic and iterative HMM search procedure , 2010, BMC Bioinformatics.

[12]  Janusz M. Bujnicki,et al.  GDFuzz3D: a method for protein 3D structure reconstruction from contact maps, based on a non-Euclidean distance function , 2015, Bioinform..

[13]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[14]  David T. Jones,et al.  MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins , 2014, Bioinform..

[15]  Jianlin Cheng,et al.  A study and benchmark of DNcon: a method for protein residue-residue contact prediction using deep networks , 2013, BMC Bioinformatics.

[16]  Mirco Michel,et al.  Large-scale structure prediction by improved contact predictions and model quality assessment , 2017 .

[17]  A. Biegert,et al.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment , 2011, Nature Methods.

[18]  Jie Hou,et al.  ConEVA: a toolbox for comprehensive assessment of protein contacts , 2016, BMC Bioinformatics.

[19]  David T. Jones,et al.  De Novo Structure Prediction of Globular Proteins Aided by Sequence Variation-Derived Contacts , 2014, PloS one.

[20]  Burkhard Rost,et al.  FreeContact: fast and free software for protein contact prediction from residue co-evolution , 2014, BMC Bioinformatics.

[21]  David T. Jones,et al.  Accurate contact predictions using covariation techniques and machine learning , 2015, Proteins.

[22]  Yang Zhang,et al.  Ab initio protein structure assembly using continuous structure fragments and optimized knowledge‐based force field , 2012, Proteins.

[23]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[24]  Jianlin Cheng,et al.  Predicting protein residue-residue contacts using deep networks and boosting , 2012, Bioinform..

[25]  Oliver Brock,et al.  Analysis of free modeling predictions by RBO aleph in CASP11 , 2016, Proteins.

[26]  Marcin J. Skwark,et al.  Predicting accurate contacts in thousands of Pfam domain families using PconsC3 , 2017, Bioinform..

[27]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[28]  Thomas A. Hopf,et al.  Protein structure prediction from sequence variation , 2012, Nature Biotechnology.

[29]  R Dustin Schaeffer,et al.  CASP 11 target classification , 2016, Proteins.

[30]  Markus Gruber,et al.  CCMpred—fast and precise prediction of protein residue–residue contacts from correlated mutations , 2014, Bioinform..

[31]  Pierre Baldi,et al.  SCRATCH: a protein structure and structural feature prediction server , 2005, Nucleic Acids Res..

[32]  A. Tramontano,et al.  New encouraging developments in contact prediction: Assessment of the CASP11 results , 2016, Proteins.

[33]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.