DEEPCON: Protein Contact Prediction using Dilated Convolutional Neural Networks with Dropout

Background Exciting new opportunities have arisen to solve the protein contact prediction problem from the progress in neural networks and the availability of a large number of homologous sequences through high-throughput sequencing. In this work, we study how deep convolutional neural network methods (ConvNets) may be best designed and developed to solve this long-standing problem. Method With publicly available datasets, we designed and trained various ConvNet architectures. We tested several recent deep learning techniques including wide residual networks, dropouts, and dilated convolutions. We studied the improvements in the precision of medium-range and long-range contacts, and compared the performance of our best architectures with the ones used in existing state-of-the-art methods. Results The proposed ConvNet architectures predict contacts with significantly more precision than the architectures used in several state-of-the-art methods. When trained using the DeepCov dataset consisting of 3,456 proteins and tested on PSICOV dataset of 150 proteins, our architectures achieve up to 15% higher precision when L/2 long-range contacts are evaluated. Similarly, when trained using the DNCON2 dataset consisting of 1,426 proteins and tested on 84 protein domains in the CASP12 dataset, our single network achieves 4.8% higher precision than the ensembled DNCON2 method when top L long-range contacts are evaluated. DEEPCON will be made publicly available at https://github.com/badriadhikari/DEEPCON/.

[1]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[2]  A. Tramontano,et al.  Critical assessment of methods of protein structure prediction (CASP)—Round XII , 2018, Proteins.

[3]  David T. Jones,et al.  High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features , 2018, Bioinform..

[4]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[5]  David T. Jones,et al.  MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins , 2014, Bioinform..

[6]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[7]  Marcin J. Skwark,et al.  Predicting accurate contacts in thousands of Pfam domain families using PconsC3 , 2017, Bioinform..

[8]  Burkhard Rost,et al.  FreeContact: fast and free software for protein contact prediction from residue co-evolution , 2014, BMC Bioinformatics.

[9]  Jie Hou,et al.  DNCON2: improved protein contact prediction using two-level deep convolutional neural networks , 2017, bioRxiv.

[10]  Zhen Li,et al.  Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model , 2016, bioRxiv.

[11]  A. Tramontano,et al.  New encouraging developments in contact prediction: Assessment of the CASP11 results , 2016, Proteins.

[12]  Peter B. McGarvey,et al.  UniRef: comprehensive and non-redundant UniProt reference clusters , 2007, Bioinform..

[13]  Björn Wallner,et al.  rawMSA: End-to-end Deep Learning Makes Protein Sequence Profiles and Feature Extraction obsolete , 2018, bioRxiv.

[14]  Robert D. Finn,et al.  HMMER web server: interactive sequence similarity searching , 2011, Nucleic Acids Res..

[15]  Maria Jesus Martin,et al.  Uniclust databases of clustered and deeply annotated protein sequences and alignments , 2016, Nucleic Acids Res..

[16]  Markus Gruber,et al.  CCMpred—fast and precise prediction of protein residue–residue contacts from correlated mutations , 2014, Bioinform..

[17]  A. Tramontano,et al.  Critical assessment of methods of protein structure prediction (CASP)—round IX , 2011, Proteins.

[18]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[19]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[20]  Mirco Michel,et al.  PconsC4: fast, free, easy, and accurate contact predictions , 2018 .

[21]  David T. Jones,et al.  Accurate contact predictions using covariation techniques and machine learning , 2015, Proteins.

[22]  Luca Viganò,et al.  Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) , 2015, IWSEC 2015.

[23]  Johannes Söding,et al.  Protein homology detection by HMM?CHMM comparison , 2005, Bioinform..