A sequence‐based computational model for the prediction of the solvent accessible surface area for α‐helix and β‐barrel transmembrane residues

Predicting the solvent accessible surface area (ASA) of transmembrane (TM) residues is of great importance for experimental researchers to elucidate diverse physiological processes. TM residues fall into two major structural classes (α‐helix membrane protein and β‐barrel membrane protein). The reported solvent ASA prediction models were developed for these two types of TM residues respectively. However, this prevents the general use of these methods because one cannot determine which model is suitable for a given TM residue without information of its type. To conquer this limitation, we developed a new computational model that can be used for predicting the ASA of both TM α‐helix and β‐barrel residues. The model was developed from 78 α‐helix membrane protein chains and 24 β‐barrel membrane protein. Its prediction ability was evaluated by cross validation method and its prediction result on an independent test set of 20 membrane protein chains. The results show that our model performs well for both types of TM residues and outperforms other prediction model which was developed for the specific type of TM residues. The prediction results also proved that the random forest model incorporating conservation score is an effective sequence‐based computational approach for predicting the solvent ASA of TM residues. © 2011 Wiley Periodicals, Inc. J Comput Chem, 2011

[1]  Xing-Ming Zhao,et al.  APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility , 2010, BMC Bioinformatics.

[2]  Roderick MacKinnon Membrane Protein Insertion and Stability , 2005, Science.

[3]  Mikael Bodén,et al.  Predicting the solvent accessibility of transmembrane residues from protein sequence. , 2006, Journal of proteome research.

[4]  S. Sunyaev,et al.  PSIC: profile extraction from sequence alignments with position-specific counts of independent observations. , 1999, Protein engineering.

[5]  G. Heijne,et al.  Genome‐wide analysis of integral membrane proteins from eubacterial, archaean, and eukaryotic organisms , 1998, Protein science : a publication of the Protein Society.

[6]  M. Šikić,et al.  PSAIA – Protein Structure and Interaction Analyzer , 2008, BMC Structural Biology.

[7]  G. von Heijne,et al.  Membrane protein structure: prediction versus reality. , 2007, Annual review of biochemistry.

[8]  Yungki Park,et al.  On the derivation of propensity scales for predicting exposed transmembrane residues of helical membrane proteins , 2007, Bioinform..

[9]  Pinak Chakrabarti,et al.  Quantifying the accessible surface area of protein residues in their local environment. , 2002, Protein engineering.

[10]  Yungki Park,et al.  How strongly do sequence conservation patterns and empirical scales correlate with exposure patterns of transmembrane helices of membrane proteins? , 2006, Biopolymers.

[11]  James U Bowie,et al.  Structural imperatives impose diverse evolutionary constraints on helical membrane proteins , 2009, Proceedings of the National Academy of Sciences.

[12]  P. Dobson,et al.  Distinguishing enzyme structures from non-enzymes without alignments. , 2003, Journal of molecular biology.

[13]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[14]  Ozlem Keskin,et al.  Identification of computational hot spots in protein interfaces: combining solvent accessibility and inter-residue potentials improves the accuracy , 2009, Bioinform..

[15]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[16]  Arne Elofsson,et al.  MPRAP: An accessibility predictor for a-helical transmem-brane proteins that performs well inside and outside the membrane , 2010, BMC Bioinformatics.

[17]  Jimin Pei,et al.  AL2CO: calculation of positional conservation in a protein sequence alignment , 2001, Bioinform..

[18]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[19]  A. Shrake,et al.  Environment and exposure to solvent of protein atoms. Lysozyme and insulin. , 1973, Journal of molecular biology.

[20]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[21]  G. Heijne,et al.  Molecular code for transmembrane-helix recognition by the Sec61 translocon , 2007, Nature.

[22]  Maarten G. Wolf,et al.  g_membed: Efficient insertion of a membrane protein into an equilibrated lipid bilayer with minimal perturbation , 2010, J. Comput. Chem..

[23]  M. Gerstein,et al.  Statistical analysis of amino acid patterns in transmembrane helices: the GxxxG motif occurs frequently and in association with beta-branched residues at neighboring positions. , 2000, Journal of molecular biology.

[24]  Yungki Park,et al.  Prediction of the burial status of transmembrane residues of helical membrane proteins , 2007, BMC Bioinformatics.

[25]  Gunnar von Heijne,et al.  Membrane Insertion of a Potassium-Channel Voltage Sensor , 2005, Science.

[26]  Chih-Jen Lin,et al.  Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..

[27]  Taehoon Kim,et al.  Novel free energy calculations to explore mechanisms and energetics of membrane protein structure and function , 2009, J. Comput. Chem..

[28]  Thijs Beuming,et al.  A knowledge-based scale for the analysis and prediction of buried and exposed faces of transmembrane domain proteins , 2004, Bioinform..

[29]  S. Henikoff,et al.  Position-based sequence weights. , 1994, Journal of molecular biology.

[30]  D. Engelman,et al.  The GxxxG motif: a framework for transmembrane helix-helix association. , 2000, Journal of molecular biology.

[31]  Yanping Zhang,et al.  The graphical representation of protein sequences based on the physicochemical properties and its applications , 2010, J. Comput. Chem..

[32]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[33]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .