A Combination of Compositional Index and Genetic Algorithm for Predicting Transmembrane Helical Segments

Transmembrane helix (TMH) topology prediction is becoming a focal problem in bioinformatics because the structure of TM proteins is difficult to determine using experimental methods. Therefore, methods that can computationally predict the topology of helical membrane proteins are highly desirable. In this paper we introduce TMHindex, a method for detecting TMH segments using only the amino acid sequence information. Each amino acid in a protein sequence is represented by a Compositional Index, which is deduced from a combination of the difference in amino acid occurrences in TMH and non-TMH segments in training protein sequences and the amino acid composition information. Furthermore, a genetic algorithm was employed to find the optimal threshold value for the separation of TMH segments from non-TMH segments. The method successfully predicted 376 out of the 378 TMH segments in a dataset consisting of 70 test protein sequences. The sensitivity and specificity for classifying each amino acid in every protein sequence in the dataset was 0.901 and 0.865, respectively. To assess the generality of TMHindex, we also tested the approach on another standard 73-protein 3D helix dataset. TMHindex correctly predicted 91.8% of proteins based on TM segments. The level of the accuracy achieved using TMHindex in comparison to other recent approaches for predicting the topology of TM proteins is a strong argument in favor of our proposed method. Availability: The datasets, software together with supplementary materials are available at: http://faculty.uaeu.ac.ae/nzaki/TMHindex.htm.

[1]  G. Tusnády,et al.  Principles governing amino acid composition of integral membrane proteins: application to topology prediction. , 1998, Journal of molecular biology.

[2]  Manuel G. Claros,et al.  TopPred II: an improved software for membrane protein structure predictions , 1994, Comput. Appl. Biosci..

[3]  Annick Thomas,et al.  Pex, analytical tools for PDB files. I. GF‐Pex: Basic file to describe a protein , 2001, Proteins.

[4]  A. Krogh,et al.  A combined transmembrane topology and signal peptide prediction method. , 2004, Journal of molecular biology.

[5]  S H White,et al.  Energetics, stability, and prediction of transmembrane helices. , 2001, Journal of molecular biology.

[6]  Osamu Ohara,et al.  DomCut: prediction of inter-domain linker regions in amino acid sequences , 2003, Bioinform..

[7]  T. Lane,et al.  Exploiting Amino Acid Composition for Predicting Protein-Protein Interactions , 2009, PloS one.

[8]  Changiz Eslahchi,et al.  PROSIGN: A method for protein secondary structure assignment based on three-dimensional coordinates of consecutive Calpha atoms , 2008, Comput. Biol. Chem..

[9]  Shigeki Mitaku,et al.  SOSUI: classification and secondary structure prediction system for membrane proteins , 1998, Bioinform..

[10]  Guang R. Gao,et al.  An improved hidden Markov model for transmembrane protein detection and topology prediction and its applications to complete genomes , 2005, Bioinform..

[11]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[12]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[13]  G. von Heijne,et al.  Topogenic signals in integral membrane proteins. , 1988, European journal of biochemistry.

[14]  Emanuel Falkenauer,et al.  Genetic Algorithms and Grouping Problems , 1998 .

[15]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[16]  A. Krogh,et al.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. , 2001, Journal of molecular biology.

[17]  Fredj Tekaia,et al.  Amino acid composition of genomes, lifestyles of organisms, and evolutionary trends: a global picture with correspondence analysis. , 2002, Gene.

[18]  David T. Jones,et al.  Transmembrane protein topology prediction using support vector machines , 2009, BMC Bioinformatics.

[19]  Iosif I Vaisman,et al.  New method for protein secondary structure assignment based on a simple topological descriptor , 2005, Proteins.

[20]  D. Doyle,et al.  Transmembrane helix prediction: a comparative evaluation and analysis. , 2005, Protein engineering, design & selection : PEDS.

[21]  G. Heijne,et al.  Genome‐wide analysis of integral membrane proteins from eubacterial, archaean, and eukaryotic organisms , 1998, Protein science : a publication of the Protein Society.

[22]  Erik L. L. Sonnhammer,et al.  A Hidden Markov Model for Predicting Transmembrane Helices in Protein Sequences , 1998, ISMB.

[23]  Alexandre G. de Brevern,et al.  Influence of assignment on the prediction of transmembrane helices in protein structures , 2010, Amino Acids.

[24]  Zsuzsanna Dosztányi,et al.  PDB_TM: selection and membrane localization of transmembrane proteins in the protein data bank , 2004, Nucleic Acids Res..

[25]  Yaoqi Zhou,et al.  Predicting the topology of transmembrane helical proteins using mean burial propensity and a hidden-Markov-model-based method , 2003 .

[26]  Hongbin Shen,et al.  MemBrain: Improving the Accuracy of Predicting Transmembrane Helices , 2008, PloS one.

[27]  Birgit Eisenhaber,et al.  TM or not TM: transmembrane protein prediction with low false positive rate using DAS-TMfilter , 2004, Bioinform..

[28]  Burkhard Rost,et al.  Refining Neural Network Predictions for Helical Transmembrane Proteins by Dynamic Programming , 1996, ISMB.