Predicting membrane protein contacts from non-membrane proteins by deep transfer learning

Computational prediction of membrane protein (MP) structures is very challenging partially due to lack of sufficient solved structures for homology modeling. Recently direct evolutionary coupling analysis (DCA) sheds some light on protein contact prediction and accordingly, contact-assisted folding, but DCA is effective only on some very large-sized families since it uses information only in a single protein family. This paper presents a deep transfer learning method that can significantly improve MP contact prediction by learning contact patterns and complex sequence-contact relationship from thousands of non-membrane proteins (non-MPs). Tested on 510 non-redundant MPs, our deep model (learned from only non-MPs) has top L/10 long-range contact prediction accuracy 0.69, better than our deep model trained by only MPs (0.63) and much better than a representative DCA method CCMpred (0.47) and the CASP11 winner MetaPSICOV (0.55). The accuracy of our deep model can be further improved to 0.72 when trained by a mix of non-MPs and MPs. When only contacts in transmembrane regions are evaluated, our method has top L/10 long-range accuracy 0.62, 0.57, and 0.53 when trained by a mix of non-MPs and MPs, by non-MPs only, and by MPs only, respectively, still much better than MetaPSICOV (0.45) and CCMpred (0.40). All these results suggest that sequence-structure relationship learned by our deep model from non-MPs generalizes well to MP contact prediction. Improved contact prediction also leads to better contact-assisted folding. Using only top predicted contacts as restraints, our deep learning method can fold 160 and 200 of 510 MPs with TMscore>0.6 when trained by non-MPs only and by a mix of non-MPs and MPs, respectively, while CCMpred and MetaPSICOV can do so for only 56 and 77 MPs, respectively. Our contact-assisted folding also greatly outperforms homology modeling.

[1]  Jian Peng,et al.  Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields , 2015, Scientific Reports.

[2]  Markus Gruber,et al.  CCMpred—fast and precise prediction of protein residue–residue contacts from correlated mutations , 2014, Bioinform..

[3]  A. Krogh,et al.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. , 2001, Journal of molecular biology.

[4]  A. Barabasi,et al.  Drug—target network , 2007, Nature Biotechnology.

[5]  Jianlin Cheng,et al.  CONFOLD: Residue‐residue contact‐guided ab initio protein folding , 2015, Proteins.

[6]  D. Baker,et al.  Assessing the utility of coevolution-based residue–residue contact predictions in a sequence- and structure-rich era , 2013, Proceedings of the National Academy of Sciences.

[7]  Marcin J. Skwark,et al.  Improved Contact Predictions Using the Recognition of Protein Like Contact Patterns , 2014, PLoS Comput. Biol..

[8]  Thomas A. Hopf,et al.  Protein 3D Structure Computed from Evolutionary Sequence Variation , 2011, PloS one.

[9]  Zhiyong Wang,et al.  Predicting protein contact map using evolutionary and physical constraints by integer programming , 2013, Bioinform..

[10]  Pierre Baldi,et al.  Deep architectures for protein contact map prediction , 2012, Bioinform..

[11]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[12]  Mukta Phatak,et al.  Solvent and lipid accessibility prediction as a basis for model quality assessment in soluble and membrane proteins. , 2011, Current protein & peptide science.

[13]  Zhiyong Wang,et al.  Protein contact prediction by integrating joint evolutionary coupling analysis and supervised learning , 2013, Bioinform..

[14]  Ben M. Webb,et al.  Comparative Protein Structure Modeling Using Modeller , 2006, Current protocols in bioinformatics.

[15]  David T. Jones,et al.  MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins , 2014, Bioinform..

[16]  Catherine Etchebest,et al.  Determining membrane protein structures: still a challenge! , 2007, Trends in biochemical sciences.

[17]  Wei Li,et al.  CoinFold: a web server for protein contact prediction and contact-assisted protein folding , 2016, Nucleic Acids Res..

[18]  Ziding Zhang,et al.  Predicting Residue-Residue Contacts and Helix-Helix Interactions in Transmembrane Proteins Using an Integrative Feature-Based Random Forest Approach , 2011, PloS one.

[19]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[20]  Dániel Kozma,et al.  PDBTM: Protein Data Bank of transmembrane proteins after 8 years , 2012, Nucleic Acids Res..

[21]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Jianzhu Ma,et al.  AcconPred: Predicting Solvent Accessibility and Contact Number Simultaneously by a Multitask Learning Framework under the Conditional Neural Fields Model , 2015, BioMed research international.

[23]  Andrej Sali,et al.  Comparative Protein Structure Modeling Using MODELLER , 2014, Current protocols in bioinformatics.

[24]  Christodoulos A Floudas,et al.  Alpha-helical topology prediction and generation of distance restraints in membrane proteins. , 2008, Biophysical journal.

[25]  Yang Zhang,et al.  A comprehensive assessment of sequence-based and template-based methods for protein contact prediction , 2008, Bioinform..

[26]  Yang Zhang,et al.  Scoring function for automated assessment of protein structure template quality , 2004, Proteins.

[27]  Dong Xu,et al.  OMPcontact: An Outer Membrane Protein Inter-Barrel Residue Contact Prediction Method , 2017, J. Comput. Biol..

[28]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[29]  Yang Zhang,et al.  High-accuracy prediction of transmembrane inter-helix contacts and application to GPCR 3D structure modeling , 2013, Bioinform..

[30]  R J Read,et al.  Crystallography & NMR system: A new software suite for macromolecular structure determination. , 1998, Acta crystallographica. Section D, Biological crystallography.

[31]  R. Jernigan,et al.  Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation , 1985 .

[32]  Zhendong Bei,et al.  COMSAT: Residue contact prediction of transmembrane proteins based on support vector machines and mixed integer linear programming , 2016, Proteins.

[33]  Arne Elofsson,et al.  The TOPCONS web server for consensus prediction of membrane protein topology and signal peptides , 2015 .

[34]  D. Frishman,et al.  Prediction of helix–helix contacts and interacting helices in polytopic membrane proteins using neural networks , 2009, Proteins.

[35]  David T. Jones,et al.  Predicting Transmembrane Helix Packing Arrangements using Residue Contacts and a Force-Directed Algorithm , 2010, PLoS Comput. Biol..

[36]  D. Thirumalai,et al.  Pair potentials for protein folding: Choice of reference states and sensitivity of predicted native states to variations in the interaction schemes , 2008, Protein science : a publication of the Protein Society.

[37]  Marcin J. Skwark,et al.  Accurate contact predictions for thousands of protein families using PconsC3 , 2016 .

[38]  Johannes Söding,et al.  Protein homology detection by HMM?CHMM comparison , 2005, Bioinform..

[39]  G. Heijne,et al.  Genome‐wide analysis of integral membrane proteins from eubacterial, archaean, and eukaryotic organisms , 1998, Protein science : a publication of the Protein Society.

[40]  Wen-Lian Hsu,et al.  Predicting helix–helix interactions from residue contacts in membrane proteins , 2009, Bioinform..

[41]  Wei Li,et al.  RaptorX-Property: a web server for protein structure property prediction , 2016, Nucleic Acids Res..

[42]  Massimiliano Pontil,et al.  PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments , 2012, Bioinform..

[43]  G. von Heijne,et al.  Tissue-based map of the human proteome , 2015, Science.

[44]  A. Tramontano,et al.  New encouraging developments in contact prediction: Assessment of the CASP11 results , 2016, Proteins.

[45]  A. Biegert,et al.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment , 2011, Nature Methods.