Prediction of integral membrane protein type by collocated hydrophobic amino acid pairs

A computational model, IMP‐TYPE, is proposed for the classification of five types of integral membrane proteins from protein sequence. The proposed model aims not only at providing accurate predictions but most importantly it incorporates interesting and transparent biological patterns. When contrasted with the best‐performing existing models, IMP‐TYPE reduces the error rates of these methods by 19 and 34% for two out‐of‐sample tests performed on benchmark datasets. Our empirical evaluations also show that the proposed method provides even bigger improvements, i.e., 29 and 45% error rate reductions, when predictions are performed for sequences that share low (40%) identity with sequences from the training dataset. We also show that IMP‐TYPE can be used in a standalone mode, i.e., it duplicates significant majority of correct predictions provided by other leading methods, while providing additional correct predictions which are incorrectly classified by the other methods. Our method computes predictions using a Support Vector Machine classifier that takes feature‐based encoded sequence as its input. The input feature set includes hydrophobic AA pairs, which were selected by utilizing a consensus of three feature selection algorithms. The hydrophobic residues that build up the AA pairs used by our method are shown to be associated with the formation of transmembrane helices in a few recent studies concerning integral membrane proteins. Our study also indicates that Met and Phe display a certain degree of hydrophobicity, which may be more crucial than their polarity or aromaticity when they occur in the transmembrane segments. This conclusion is supported by a recent study on potential of mean force for membrane protein folding and a study of scales for membrane propensity of amino acids. © 2008 Wiley Periodicals, Inc. J Comput Chem, 2009

[1]  Lukasz A. Kurgan,et al.  PFRES: protein fold classification by using evolutionary information and predicted secondary structure , 2007, Bioinform..

[2]  M. Wang,et al.  Weighted-support vector machines for predicting membrane protein types based on pseudo-amino acid composition. , 2004, Protein engineering, design & selection : PEDS.

[3]  D. Engelman,et al.  The GxxxG motif: a framework for transmembrane helix-helix association. , 2000, Journal of molecular biology.

[4]  D. Eisenberg,et al.  The hydrophobic moment detects periodicity in protein hydrophobicity. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Kuo-Chen Chou,et al.  Fuzzy KNN for predicting membrane protein types from pseudo-amino acid composition. , 2006, Journal of theoretical biology.

[6]  F. Young Biochemistry , 1955, The Indian Medical Gazette.

[7]  David T. Jones,et al.  Improving the accuracy of transmembrane protein topology prediction using evolutionary information , 2007, Bioinform..

[8]  K. Chou,et al.  Recent progress in protein subcellular location prediction. , 2007, Analytical biochemistry.

[9]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[10]  H D Dakin,et al.  On Amino-acids. , 1918, The Biochemical journal.

[11]  Kuo-Chen Chou,et al.  Using optimized evidence-theoretic K-nearest neighbor classifier and pseudo-amino acid composition to predict membrane protein types. , 2005, Biochemical and biophysical research communications.

[12]  Ke Chen,et al.  Prediction of flexible/rigid regions from protein sequences using k-spaced amino acid pairs , 2007, BMC Structural Biology.

[13]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[14]  M. Sansom,et al.  Amino acid distributions in integral membrane protein structures. , 2001, Biochimica et biophysica acta.

[15]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[16]  Lukasz Kurgan,et al.  Prediction of protein crystallization using collocation of amino acid pairs. , 2007, Biochemical and biophysical research communications.

[17]  Shigeki Mitaku,et al.  SOSUI: classification and secondary structure prediction system for membrane proteins , 1998, Bioinform..

[18]  K. Chou,et al.  Prediction of membrane protein types and subcellular locations , 1999, Proteins.

[19]  Kuo-Chen Chou,et al.  MemType-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM. , 2007, Biochemical and biophysical research communications.

[20]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[21]  Lukasz A. Kurgan,et al.  Prediction of structural classes for protein sequences and domains - Impact of prediction algorithms, sequence representation and homology, and test procedures on accuracy , 2006, Pattern Recognit..

[22]  Ke Chen,et al.  Prediction of protein secondary structure content for the twilight zone sequences , 2007, Proteins.

[23]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[24]  James J. Chou,et al.  The Structure of the ζζ Transmembrane Dimer Reveals Features Essential for Its Assembly with the T Cell Receptor , 2006, Cell.

[25]  Zheng Yuan,et al.  SVMtm: Support vector machines to predict transmembrane segments , 2004, J. Comput. Chem..

[26]  D. Doyle,et al.  Transmembrane helix prediction: a comparative evaluation and analysis. , 2005, Protein engineering, design & selection : PEDS.

[27]  G. Heijne,et al.  Genome‐wide analysis of integral membrane proteins from eubacterial, archaean, and eukaryotic organisms , 1998, Protein science : a publication of the Protein Society.

[28]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[29]  Kuo-Chen Chou,et al.  Prediction of Membrane Protein Types by Incorporating Amphipathic Effects , 2005, J. Chem. Inf. Model..

[30]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[31]  Birgit Eisenhaber,et al.  TM or not TM: transmembrane protein prediction with low false positive rate using DAS-TMfilter , 2004, Bioinform..

[32]  J. Chou,et al.  Structure and mechanism of the M2 proton channel of influenza A virus , 2008, Nature.

[33]  Zsuzsanna Dosztányi,et al.  TMDET: web server for detecting transmembrane regions of proteins by using their 3D coordinates , 2005, Bioinform..

[34]  Meng Wang,et al.  SLLE for predicting membrane protein types. , 2005, Journal of theoretical biology.

[35]  K. Chou,et al.  Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms , 2008, Nature Protocols.

[36]  H.-B. Shen,et al.  Using ensemble classifier to identify membrane protein types , 2006, Amino Acids.

[37]  K. Chou,et al.  Support vector machines for predicting membrane protein types by using functional domain composition. , 2003, Biophysical journal.

[38]  Martin B Ulmschneider,et al.  Properties of integral membrane protein structures: Derivation of an implicit membrane potential , 2005, Proteins.

[39]  Kuo-Chen Chou,et al.  Using GO-PseAA predictor to identify membrane proteins and their types. , 2005, Biochemical and biophysical research communications.

[40]  David Eisenberg,et al.  GXXXG and AXXXA: Common α-Helical Interaction Motifs in Proteins, Particularly in Extremophiles† , 2002 .

[41]  J. Chou,et al.  The structure of phospholamban pentamer reveals a channel-like architecture in membranes. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[42]  Shandar Ahmad,et al.  Neural network‐based prediction of transmembrane β‐strand segments in outer membrane proteins , 2004, J. Comput. Chem..

[43]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[44]  M. Gerstein,et al.  Statistical analysis of amino acid patterns in transmembrane helices: the GxxxG motif occurs frequently and in association with beta-branched residues at neighboring positions. , 2000, Journal of molecular biology.

[45]  K. Chou,et al.  Application of SVM to predict membrane protein types. , 2004, Journal of theoretical biology.

[46]  C. Deber,et al.  Non-random distribution of amino acids in the transmembrane segments of human type I single span membrane proteins. , 1993, Journal of molecular biology.

[47]  R. Rosenfeld Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[48]  G. Heijne Membrane-protein topology , 2006, Nature Reviews Molecular Cell Biology.

[49]  Derek Sleeman,et al.  Proceedings of the Ninth International Workshop on Machine Learning , 1992 .

[50]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[51]  Meng Wang,et al.  Using Fourier Spectrum Analysis and Pseudo Amino Acid Composition for Prediction of Membrane Protein Types , 2005, The protein journal.

[52]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[53]  Kuo-Chen Chou,et al.  Using stacked generalization to predict membrane protein types based on pseudo-amino acid composition. , 2006, Journal of theoretical biology.

[54]  Lukasz A. Kurgan,et al.  Prediction of protein structural class using novel evolutionary collocation‐based sequence representation , 2008, J. Comput. Chem..

[55]  C. Tate,et al.  Overexpression of integral membrane proteins for structural studies , 1995, Quarterly Reviews of Biophysics.

[56]  Haesun Park,et al.  Prediction of protein relative solvent accessibility with support vector machines and long‐range interaction 3D local descriptor , 2004, Proteins.

[57]  A. Maritan,et al.  A knowledge‐based scale for amino acid membrane propensity , 2002, Proteins.