Topology prediction improvement of α-helical transmembrane proteins through helix-tail modeling and multiscale deep learning fusion.

Transmembrane proteins (TMPs) play important roles in many biological processes, such as cell recognition and communication. Their structures are crucial for revealing complex functions but are hard to obtain. A variety of computational algorithms have been proposed to fill the gap by predicting structures from primary sequences. In this study, we mainly focus on α-helical TMP and develop a multiscale deep learning pipeline, MemBrain 3.0, to improve topology prediction. This new protocol includes two submodules. The first module is transmembrane helix (TMH) prediction, which features the capability of accurately predicting TMH with the tail part through the incorporation of tail modeling. The prediction engine contains a multiscale deep learning model and a dynamic threshold strategy. The deep learning model is comprised of a small-scale residue-based residual neural network and a large-scale entire-sequence-based residual neural network. Dynamic threshold strategy is designed to binarize the raw prediction scores and solve the under-split problem. The second module is orientation prediction, which consists of a support vector machine (SVM) classifier and a new Max-Min assignment (MMA) strategy. One typical merit of MemBrain 3.0 is the decision mode composed of the dynamic threshold strategy and the MMA strategy, which makes it more effective for hard TMHs, such as half-TMH, back-to-back TMH, and long-TMH. Systematic experiments have demonstrated the efficacy of the new model, which is available at: www.csbio.sjtu.edu.cn/bioinf/MemBrain/.

[1]  David T. Jones,et al.  Transmembrane protein topology prediction using support vector machines , 2009, BMC Bioinformatics.

[2]  David E. Kim,et al.  Large-scale determination of previously unsolved protein structures using evolutionary information , 2015, eLife.

[3]  Masami Ikeda,et al.  ConPred II: a consensus prediction method for obtaining transmembrane topology models with high reliability , 2004, Nucleic Acids Res..

[4]  Burkhard Rost,et al.  UniqueProt: creating representative protein sequence sets , 2003, Nucleic Acids Res..

[5]  Jaime Prilusky,et al.  Interplay between hydrophobicity and the positive-inside rule in determining membrane-protein topology , 2016, Proceedings of the National Academy of Sciences.

[6]  Piero Fariselli,et al.  An ENSEMBLE machine learning approach for the prediction of all-alpha membrane proteins , 2003, ISMB.

[7]  Burkhard Rost,et al.  Evaluation of transmembrane helix predictions in 2014 , 2015, Proteins.

[8]  Stephen H. White,et al.  Experimentally determined hydrophobicity scale for proteins at membrane interfaces , 1996, Nature Structural Biology.

[9]  Erik L. L. Sonnhammer,et al.  An HMM posterior decoder for sequence feature prediction that includes homology information , 2005, ISMB.

[10]  Maria Jesus Martin,et al.  SIFTS: Structure Integration with Function, Taxonomy and Sequences resource , 2012, Nucleic Acids Res..

[11]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[12]  A. Biegert,et al.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment , 2011, Nature Methods.

[13]  G. von Heijne,et al.  Prediction of membrane-protein topology from first principles , 2008, Proceedings of the National Academy of Sciences.

[14]  Arne Elofsson,et al.  OCTOPUS: improving topology prediction by two-track ANN-based preference scores and an extended topological grammar , 2008, Bioinform..

[15]  Georgios A. Pavlopoulos,et al.  Protein structure determination using metagenome sequence data , 2017, Science.

[16]  Maria Jesus Martin,et al.  Uniclust databases of clustered and deeply annotated protein sequences and alignments , 2016, Nucleic Acids Res..

[17]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[19]  Timothy Nugent,et al.  Accurate de novo structure prediction of large transmembrane protein domains using fragment-assembly and correlated mutation analysis , 2012, Proceedings of the National Academy of Sciences.

[20]  Arne Elofsson,et al.  TOPCONS: consensus prediction of membrane protein topology , 2009, Nucleic Acids Res..

[21]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[22]  Yang Zhang,et al.  High-accuracy prediction of transmembrane inter-helix contacts and application to GPCR 3D structure modeling , 2013, Bioinform..

[23]  Jeff A. Bilmes,et al.  Transmembrane Topology and Signal Peptide Prediction Using Dynamic Bayesian Networks , 2008, PLoS Comput. Biol..

[24]  Marcin J. Skwark,et al.  Sequence analysis SPOCTOPUS: a combined predictor of signal peptides and membrane protein topology , 2008 .

[25]  Nir Ben-Tal,et al.  Free energy determinants of peptide association with lipid bilayers , 2002 .

[26]  Robert Fredriksson,et al.  Mapping the human membrane proteome : a majority of the human membrane proteins can be classified according to function and evolutionary origin , 2015 .

[27]  David T. Jones,et al.  Improving the accuracy of transmembrane protein topology prediction using evolutionary information , 2007, Bioinform..

[28]  T. Steitz,et al.  Identifying nonpolar transbilayer helices in amino acid sequences of membrane proteins. , 1986, Annual review of biophysics and biophysical chemistry.

[29]  István Reményi,et al.  CCTOP: a Consensus Constrained TOPology prediction web server , 2015, Nucleic Acids Res..

[30]  A. Krogh,et al.  A combined transmembrane topology and signal peptide prediction method. , 2004, Journal of molecular biology.

[31]  Hyeon Joo,et al.  OPM database and PPM web server: resources for positioning of proteins in membranes , 2011, Nucleic Acids Res..

[32]  Burkhard Rost,et al.  Structure and selectivity in bestrophin ion channels , 2014, Science.

[33]  Dmitrij Frishman,et al.  Accurate prediction of helix interactions and residue contacts in membrane proteins. , 2016, Journal of structural biology.

[34]  A. Elofsson,et al.  Best α‐helical transmembrane protein topology predictions are achieved using hidden Markov models and evolutionary information , 2004 .

[35]  B. Rost,et al.  TMSEG: Novel prediction of transmembrane helices , 2016, Proteins.

[36]  D. Baker,et al.  Multipass membrane protein structure prediction using Rosetta , 2005, Proteins.

[37]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[38]  Feng Zhao,et al.  PredMP: a web server for de novo prediction and visualization of membrane proteins , 2018, Bioinform..

[39]  Hong-Bin Shen,et al.  Predicting RNA‐protein binding sites and motifs through combining local and global deep convolutional neural networks , 2018, Bioinform..

[40]  John P. Overington,et al.  How many drug targets are there? , 2006, Nature Reviews Drug Discovery.

[41]  S. Brunak,et al.  SignalP 4.0: discriminating signal peptides from transmembrane regions , 2011, Nature Methods.

[42]  Reinhard Jahn,et al.  Helical extension of the neuronal SNARE complex into the membrane , 2009, Nature.

[43]  Maria Jesus Martin,et al.  SIFTS: updated Structure Integration with Function, Taxonomy and Sequences resource allows 40-fold increase in coverage of structure-based annotations for proteins , 2018, Nucleic Acids Res..

[44]  B. Rost Twilight zone of protein sequence alignments. , 1999, Protein engineering.

[45]  G. Tusnády,et al.  Principles governing amino acid composition of integral membrane proteins: application to topology prediction. , 1998, Journal of molecular biology.

[46]  Zhen Yan,et al.  Structure of the voltage-gated calcium channel Cav1.1 complex , 2015, Science.

[47]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[48]  Junchi Yan,et al.  Prediction of RNA-protein sequence and structure binding preferences using deep convolutional and recurrent neural networks , 2017, BMC Genomics.

[49]  Chungho Kim,et al.  The structure of the integrin αIIbβ3 transmembrane complex explains integrin transmembrane signalling , 2009, The EMBO journal.

[50]  Hong-Bin Shen,et al.  Signal-3L 2.0: A Hierarchical Mixture Model for Enhancing Protein Signal Peptide Prediction by Incorporating Residue-Domain Cross-Level Features , 2017, J. Chem. Inf. Model..

[51]  Nicholas Noinaj,et al.  Structural insight into the role of the Ton complex in energy transduction , 2016, Nature.

[52]  A. Krogh,et al.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. , 2001, Journal of molecular biology.

[53]  S. O. Smith,et al.  A binding pocket for a small molecule inhibitor of HIV-1 entry within the transmembrane helices of CCR5. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[54]  Kuldip K. Paliwal,et al.  Capturing non‐local interactions by long short‐term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility , 2017, Bioinform..

[55]  Manuel G. Claros,et al.  TopPred II: an improved software for membrane protein structure predictions , 1994, Comput. Appl. Biosci..

[56]  Hong-Bin Shen,et al.  MemBrain: An Easy-to-Use Online Webserver for Transmembrane Protein Structure Prediction , 2018, Nano-micro letters.

[57]  G. Heijne Membrane protein structure prediction. Hydrophobicity analysis and the positive-inside rule. , 1992, Journal of molecular biology.

[58]  G. von Heijne,et al.  Prediction of partial membrane protein topologies using a consensus approach , 2002, Protein science : a publication of the Protein Society.

[59]  Hongbin Shen,et al.  MemBrain: Improving the Accuracy of Predicting Transmembrane Helices , 2008, PloS one.

[60]  Kay Hofmann,et al.  Tmbase-A database of membrane spanning protein segments , 1993 .

[61]  Kuldip K. Paliwal,et al.  Accurate prediction of protein contact maps by coupling residual two-dimensional bidirectional long short-term memory with convolutional neural networks , 2018, Bioinform..

[62]  Hong-Bin Shen,et al.  MemBrain-contact 2.0: a new two-stage machine learning model for the prediction enhancement of transmembrane protein residue contacts in the full chain , 2018, Bioinform..