TMSEG: Novel prediction of transmembrane helices

Transmembrane proteins (TMPs) are important drug targets because they are essential for signaling, regulation, and transport. Despite important breakthroughs, experimental structure determination remains challenging for TMPs. Various methods have bridged the gap by predicting transmembrane helices (TMHs), but room for improvement remains. Here, we present TMSEG, a novel method identifying TMPs and accurately predicting their TMHs and their topology. The method combines machine learning with empirical filters. Testing it on a non‐redundant dataset of 41 TMPs and 285 soluble proteins, and applying strict performance measures, TMSEG outperformed the state‐of‐the‐art in our hands. TMSEG correctly distinguished helical TMPs from other proteins with a sensitivity of 98 ± 2% and a false positive rate as low as 3 ± 1%. Individual TMHs were predicted with a precision of 87 ± 3% and recall of 84 ± 3%. Furthermore, in 63 ± 6% of helical TMPs the placement of all TMHs and their inside/outside topology was correctly predicted. There are two main features that distinguish TMSEG from other methods. First, the errors in finding all helical TMPs in an organism are significantly reduced. For example, in human this leads to 200 and 1600 fewer misclassifications compared to the second and third best method available, and 4400 fewer mistakes than by a simple hydrophobicity‐based method. Second, TMSEG provides an add‐on improvement for any existing method to benefit from. Proteins 2016; 84:1706–1716. © 2016 Wiley Periodicals, Inc.

[1]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[2]  D. Eisenberg Three-dimensional structure of membrane and surface proteins. , 1984, Annual review of biochemistry.

[3]  G. Heijne The distribution of positively charged residues in bacterial inner membrane proteins correlates with the trans‐membrane topology , 1986, The EMBO journal.

[4]  G. von Heijne,et al.  Topogenic signals in integral membrane proteins. , 1988, European journal of biochemistry.

[5]  C. Sander,et al.  Database of homology‐derived protein structures and the structural meaning of sequence alignment , 1991, Proteins.

[6]  G. Heijne Membrane protein structure prediction. Hydrophobicity analysis and the positive-inside rule. , 1992, Journal of molecular biology.

[7]  G von Heijne,et al.  Membrane protein structure prediction. Hydrophobicity analysis and the positive-inside rule. , 1992, Journal of molecular biology.

[8]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[9]  B. Rost,et al.  Transmembrane helices predicted at 95% accuracy , 1995, Protein science : a publication of the Protein Society.

[10]  Burkhard Rost,et al.  Refining Neural Network Predictions for Helical Transmembrane Proteins by Dynamic Programming , 1996, ISMB.

[11]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[12]  B. Rost Twilight zone of protein sequence alignments. , 1999, Protein engineering.

[13]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[14]  A. Krogh,et al.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. , 2001, Journal of molecular biology.

[15]  István Simon,et al.  The HMMTOP transmembrane topology prediction server , 2001, Bioinform..

[16]  B. Rost,et al.  Comparing function and structure between entire proteomes , 2001, Protein science : a publication of the Protein Society.

[17]  A. Kernytsky,et al.  Transmembrane helix predictions revisited , 2002, Protein science : a publication of the Protein Society.

[18]  Burkhard Rost,et al.  UniqueProt: creating representative protein sequence sets , 2003, Nucleic Acids Res..

[19]  B. Rost,et al.  Automatic prediction of protein function , 2003, Cellular and Molecular Life Sciences CMLS.

[20]  A. Krogh,et al.  A combined transmembrane topology and signal peptide prediction method. , 2004, Journal of molecular biology.

[21]  Henry R. Bigelow,et al.  Predicting transmembrane beta-barrels in proteomes. , 2004, Nucleic acids research.

[22]  S. White The progress of membrane protein structure determination , 2004, Protein science : a publication of the Protein Society.

[23]  Erik L. L. Sonnhammer,et al.  An HMM posterior decoder for sequence feature prediction that includes homology information , 2005, ISMB.

[24]  John P. Overington,et al.  How many drug targets are there? , 2006, Nature Reviews Drug Discovery.

[25]  Arne Elofsson,et al.  ZPRED: Predicting the distance to the membrane center for residues in alpha-helical membrane proteins , 2006, ISMB.

[26]  Andrei L. Lomize,et al.  OPM: Orientations of Proteins in Membranes database , 2006, Bioinform..

[27]  G. von Heijne,et al.  The membrane protein universe: what's out there and why bother? , 2007, Journal of internal medicine.

[28]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[29]  David T. Jones,et al.  Improving the accuracy of transmembrane protein topology prediction using evolutionary information , 2007, Bioinform..

[30]  Arne Elofsson,et al.  Estimating the length of transmembrane helices using Z‐coordinate predictions , 2008, Protein science : a publication of the Protein Society.

[31]  David T. Jones,et al.  Transmembrane protein topology prediction using support vector machines , 2009, BMC Bioinformatics.

[32]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[33]  Marco Punta,et al.  Structural genomics target selection for the New York consortium on membrane protein structure , 2009, Journal of Structural and Functional Genomics.

[34]  S. White,et al.  Biophysical dissection of membrane proteins , 2009, Nature.

[35]  Kalle Jonasson,et al.  Prediction of the human membrane proteome , 2010, Proteomics.

[36]  Marco Punta,et al.  The New York Consortium on Membrane Protein Structure (NYCOMPS): a high-throughput platform for structural genomics of integral membrane proteins , 2010, Journal of Structural and Functional Genomics.

[37]  S. Brunak,et al.  SignalP 4.0: discriminating signal peptides from transmembrane regions , 2011, Nature Methods.

[38]  Tim Werner,et al.  A benchmark server using high resolution protein structure data, and benchmark results for membrane helix predictions , 2013, BMC Bioinformatics.

[39]  Marco Punta,et al.  Structural genomics plucks high-hanging membrane proteins. , 2012, Current opinion in structural biology.

[40]  Maria Jesus Martin,et al.  SIFTS: Structure Integration with Function, Taxonomy and Sequences resource , 2012, Nucleic Acids Res..

[41]  Dániel Kozma,et al.  PDBTM: Protein Data Bank of transmembrane proteins after 8 years , 2012, Nucleic Acids Res..

[42]  Gwyndaf Evans,et al.  Membrane protein structure determination — The next generation , 2014, Biochimica et biophysica acta.

[43]  Avner Schlessinger,et al.  PredictProtein—an open resource for online prediction of protein structural and functional features , 2014, Nucleic Acids Res..

[44]  Burkhard Rost,et al.  Evaluation of transmembrane helix predictions in 2014 , 2015, Proteins.

[45]  María Martín,et al.  UniProt: A hub for protein information , 2015 .

[46]  The Uniprot Consortium,et al.  UniProt: a hub for protein information , 2014, Nucleic Acids Res..

[47]  Martin Caffrey,et al.  A comprehensive review of the lipid cubic phase or in meso method for crystallizing membrane and soluble proteins and complexes , 2015, Acta crystallographica. Section F, Structural biology communications.