Folding membrane proteins by deep transfer learning

Computational elucidation of membrane protein (MP) structures is challenging partially due to lack of sufficient solved structures for homology modeling. Here we describe a high-throughput deep transfer learning method that first predicts MP contacts by learning from non-membrane proteins (non-MPs) and then predicting three-dimensional structure models using the predicted contacts as distance restraints. Tested on 510 non-redundant MPs, our method has contact prediction accuracy at least 0.18 better than existing methods, predicts correct folds for 218 MPs (TMscore>0.6), and generates three-dimensional models with RMSD less than 4Å and 5Å for 57 and 108 MPs, respectively. A rigorous blind test in the continuous automated model evaluation (CAMEO) project shows that our method predicted high-resolution three-dimensional models for two recent test MPs of 210 residues with RMSD ∼2Å. We estimated that our method could predict correct folds for 1,345–1,871 reviewed human multi-pass MPs including a few hundred new folds, which shall facilitate the discovery of drugs targeting at membrane proteins.

[1]  Marcin J. Skwark,et al.  Predicting accurate contacts in thousands of Pfam domain families using PconsC3 , 2017, Bioinform..

[2]  Dong Xu,et al.  OMPcontact: An Outer Membrane Protein Inter-Barrel Residue Contact Prediction Method , 2017, J. Comput. Biol..

[3]  Georgios A. Pavlopoulos,et al.  Protein structure determination using metagenome sequence data , 2017, Science.

[4]  Zhen Li,et al.  Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model , 2016, bioRxiv.

[5]  A. Tramontano,et al.  New encouraging developments in contact prediction: Assessment of the CASP11 results , 2016, Proteins.

[6]  Ben M. Webb,et al.  Comparative Protein Structure Modeling Using MODELLER , 2016, Current protocols in bioinformatics.

[7]  Wei Li,et al.  RaptorX-Property: a web server for protein structure property prediction , 2016, Nucleic Acids Res..

[8]  Zhendong Bei,et al.  COMSAT: Residue contact prediction of transmembrane proteins based on support vector machines and mixed integer linear programming , 2016, Proteins.

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Jian Peng,et al.  Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields , 2015, Scientific Reports.

[11]  Jianlin Cheng,et al.  CONFOLD: Residue‐residue contact‐guided ab initio protein folding , 2015, Proteins.

[12]  Arne Elofsson,et al.  The TOPCONS web server for consensus prediction of membrane protein topology and signal peptides , 2015, Nucleic Acids Res..

[13]  Michael J E Sternberg,et al.  The Phyre2 web portal for protein modeling, prediction and analysis , 2015, Nature Protocols.

[14]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[15]  G. von Heijne,et al.  Tissue-based map of the human proteome , 2015, Science.

[16]  María Martín,et al.  UniProt: A hub for protein information , 2015 .

[17]  David T. Jones,et al.  MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins , 2014, Bioinform..

[18]  Marcin J. Skwark,et al.  Improved Contact Predictions Using the Recognition of Protein Like Contact Patterns , 2014, PLoS Comput. Biol..

[19]  The Uniprot Consortium,et al.  UniProt: a hub for protein information , 2014, Nucleic Acids Res..

[20]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[21]  Markus Gruber,et al.  CCMpred—fast and precise prediction of protein residue–residue contacts from correlated mutations , 2014, Bioinform..

[22]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[23]  Marco Biasini,et al.  SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information , 2014, Nucleic Acids Res..

[24]  Anna Tramontano,et al.  Critical assessment of methods of protein structure prediction (CASP) — round x , 2014, Proteins.

[25]  Zhiyong Wang,et al.  Protein contact prediction by integrating joint evolutionary coupling analysis and supervised learning , 2013, Bioinform..

[26]  Yang Zhang,et al.  High-accuracy prediction of transmembrane inter-helix contacts and application to GPCR 3D structure modeling , 2013, Bioinform..

[27]  D. Baker,et al.  Assessing the utility of coevolution-based residue–residue contact predictions in a sequence- and structure-rich era , 2013, Proceedings of the National Academy of Sciences.

[28]  Zhiyong Wang,et al.  Predicting protein contact map using evolutionary and physical constraints by integer programming , 2013, Bioinform..

[29]  Juergen Haas,et al.  The Protein Model Portal—a comprehensive resource for protein structure and model information , 2013, Database J. Biol. Databases Curation.

[30]  Dániel Kozma,et al.  PDBTM: Protein Data Bank of transmembrane proteins after 8 years , 2012, Nucleic Acids Res..

[31]  Pierre Baldi,et al.  Deep architectures for protein contact map prediction , 2012, Bioinform..

[32]  Jian Peng,et al.  Template-based protein structure modeling using the RaptorX web server , 2012, Nature Protocols.

[33]  Massimiliano Pontil,et al.  PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments , 2012, Bioinform..

[34]  A. Biegert,et al.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment , 2011, Nature Methods.

[35]  Thomas A. Hopf,et al.  Protein 3D Structure Computed from Evolutionary Sequence Variation , 2011, PloS one.

[36]  Ziding Zhang,et al.  Predicting Residue-Residue Contacts and Helix-Helix Interactions in Transmembrane Proteins Using an Integrative Feature-Based Random Forest Approach , 2011, PloS one.

[37]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..

[38]  Mukta Phatak,et al.  Solvent and lipid accessibility prediction as a basis for model quality assessment in soluble and membrane proteins. , 2011, Current protein & peptide science.

[39]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[40]  David T. Jones,et al.  Predicting Transmembrane Helix Packing Arrangements using Residue Contacts and a Force-Directed Algorithm , 2010, PLoS Comput. Biol..

[41]  D. Frishman,et al.  Prediction of helix–helix contacts and interacting helices in polytopic membrane proteins using neural networks , 2009, Proteins.

[42]  Wen-Lian Hsu,et al.  Predicting helix–helix interactions from residue contacts in membrane proteins , 2009, Bioinform..

[43]  T. Hwa,et al.  Identification of direct residue contacts in protein–protein interaction by message passing , 2009, Proceedings of the National Academy of Sciences.

[44]  D. Thirumalai,et al.  Pair potentials for protein folding: Choice of reference states and sensitivity of predicted native states to variations in the interaction schemes , 2008, Protein science : a publication of the Protein Society.

[45]  Christodoulos A Floudas,et al.  Alpha-helical topology prediction and generation of distance restraints in membrane proteins. , 2008, Biophysical journal.

[46]  Ben M. Webb,et al.  Comparative Protein Structure Modeling Using MODELLER , 2007, Current protocols in protein science.

[47]  A. Brunger Version 1.2 of the Crystallography and NMR system , 2007, Nature Protocols.

[48]  A. Barabasi,et al.  Drug—target network , 2007, Nature Biotechnology.

[49]  Catherine Etchebest,et al.  Determining membrane protein structures: still a challenge! , 2007, Trends in biochemical sciences.

[50]  Johannes Söding,et al.  Protein homology detection by HMM?CHMM comparison , 2005, Bioinform..

[51]  David Baker,et al.  Protein structure prediction and analysis using the Robetta server , 2004, Nucleic Acids Res..

[52]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[53]  A. Krogh,et al.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. , 2001, Journal of molecular biology.

[54]  R J Read,et al.  Crystallography & NMR system: A new software suite for macromolecular structure determination. , 1998, Acta crystallographica. Section D, Biological crystallography.

[55]  G. Heijne,et al.  Genome‐wide analysis of integral membrane proteins from eubacterial, archaean, and eukaryotic organisms , 1998, Protein science : a publication of the Protein Society.

[56]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[57]  R. Jernigan,et al.  Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation , 1985 .

[58]  A. Tramontano,et al.  Critical assessment of methods of protein structure prediction (CASP)—round IX , 2011, Proteins.

[59]  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btn069 Structural bioinformatics A comprehensive assessment of sequence-based and template-based methods for protein contact prediction , 2007 .

[60]  B. Rost,et al.  Critical assessment of methods of protein structure prediction (CASP)—Round 6 , 2005, Proteins.

[61]  J. Skolnick,et al.  Scoring function for automated assessment of protein structure template quality , 2004, Proteins.