IgTM: An algorithm to predict transmembrane domains and topology in proteins

BackgroundDue to their role of receptors or transporters, membrane proteins play a key role in many important biological functions. In our work we used Grammatical Inference (GI) to localize transmembrane segments. Our GI process is based specifically on the inference of Even Linear Languages.ResultsWe obtained values close to 80% in both specificity and sensitivity. Six datasets have been used for the experiments, considering different encodings for the input sequences. An encoding that includes the topology changes in the sequence (from inside and outside the membrane to it and vice versa) allowed us to obtain the best results. This software is publicly available at: http://www.dsic.upv.es/users/tlcc/bio/bio.htmlConclusionWe compared our results with other well-known methods, that obtain a slightly better precision. However, this work shows that it is possible to apply Grammatical Inference techniques in an effective way to bioinformatics problems.

[1]  Stavros J. Hamodrakas,et al.  Evaluation of methods for predicting the topology of β-barrel outer membrane proteins and a consensus prediction method , 2005, BMC Bioinformatics.

[2]  A. Krogh,et al.  A combined transmembrane topology and signal peptide prediction method. , 2004, Journal of molecular biology.

[3]  T. Hirokawa,et al.  Proportion of membrane proteins in proteomes of 15 single-cell organisms analyzed by the SOSUI prediction system. , 1999, Biophysical chemistry.

[4]  Masami Ikeda,et al.  TMPDB: a database of experimentally-characterized transmembrane topologies , 2003, Nucleic Acids Res..

[5]  D. Searls,et al.  Robots in invertebrate neuroscience , 2002, Nature.

[6]  David T. Jones,et al.  Improving the accuracy of transmembrane protein topology prediction using evolutionary information , 2007, Bioinform..

[7]  R. Guigó,et al.  Evaluation of gene structure prediction programs. , 1996, Genomics.

[8]  Mikhail S. Gelfand,et al.  Recognition of Transmembrane Segments in Proteins: Review and Consistency-based Benchmarking of Internet Servers , 2006, J. Bioinform. Comput. Biol..

[9]  Satoshi Kobayashi,et al.  Learning local languages and its application to protein /spl alpha/-chain identification , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[10]  Jean Berstel,et al.  Transductions and context-free languages , 1979, Teubner Studienbücher : Informatik.

[11]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[12]  R. Levy,et al.  Simplified amino acid alphabets for protein fold recognition and implications for folding. , 2000, Protein engineering.

[13]  Damián López,et al.  Protein Motif Prediction by Grammatical Inference , 2006, ICGI.

[14]  Y. Sugiyama,et al.  Identification of transmembrane protein functions by binary topology patterns. , 2003, Protein engineering.

[15]  José M. Sempere,et al.  A Characterization of Even Linear Languages and its Application to the Learning Problem , 1994, ICGI.

[16]  Jun Wang,et al.  Reduction of protein sequence complexity by residue grouping. , 2003, Protein engineering.

[17]  Marco Punta,et al.  Membrane protein prediction methods. , 2007, Methods.

[18]  B. Rost,et al.  Redefining the goals of protein secondary structure prediction. , 1994, Journal of molecular biology.

[19]  Shandar Ahmad,et al.  Neural network‐based prediction of transmembrane β‐strand segments in outer membrane proteins , 2004, J. Comput. Chem..

[20]  Carl H. Smith,et al.  Inductive Inference: Theory and Methods , 1983, CSUR.

[21]  S H White,et al.  MPtopo: A database of membrane protein topology , 2001, Protein science : a publication of the Protein Society.

[22]  Rolf Apweiler,et al.  A collection of well characterised integral membrane proteins , 2000, Bioinform..

[23]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[24]  Damián López,et al.  Detection of Functional Motifs in Biosequences: A Grammatical Inference Approach , 2004, Spanish Bioinformatics Conference.

[25]  Piero Fariselli,et al.  HTP: a neural network-based method for predicting the topology of helical transmembrane domains in proteins , 1996, Comput. Appl. Biosci..

[26]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[27]  Kay Hofmann,et al.  Tmbase-A database of membrane spanning protein segments , 1993 .

[28]  A. Elofsson,et al.  Best α‐helical transmembrane protein topology predictions are achieved using hidden Markov models and evolutionary information , 2004 .

[29]  Erik L. L. Sonnhammer,et al.  A Hidden Markov Model for Predicting Transmembrane Helices in Protein Sequences , 1998, ISMB.

[30]  Pedro García Learning k-Testable tree sets from positive data* , 2003 .

[31]  Stavros J. Hamodrakas,et al.  waveTM: Wavelet-based transmembrane segment prediction , 2004, Silico Biol..

[32]  István Simon,et al.  The HMMTOP transmembrane topology prediction server , 2001, Bioinform..

[33]  守屋 悦朗,et al.  J.E.Hopcroft, J.D. Ullman 著, "Introduction to Automata Theory, Languages, and Computation", Addison-Wesley, A5変形版, X+418, \6,670, 1979 , 1980 .

[34]  Erik L. L. Sonnhammer,et al.  Advantages of combined transmembrane topology and signal peptide prediction—the Phobius web server , 2007, Nucleic Acids Res..

[35]  Dana Angluin,et al.  Inductive Inference of Formal Languages from Positive Data , 1980, Inf. Control..

[36]  G. Heijne,et al.  Genome‐wide analysis of integral membrane proteins from eubacterial, archaean, and eukaryotic organisms , 1998, Protein science : a publication of the Protein Society.

[37]  Satoshi Kobayashi,et al.  Learning Local Languages and Their Application to DNA Sequence Analysis , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[38]  Timo Knuutila Inference of k -testable Tree Languages , 1993 .

[39]  S J Hamodrakas,et al.  A novel method for predicting transmembrane segments in proteins based on a statistical analysis of the SwissProt database: the PRED-TMR algorithm. , 1999, Protein engineering.