SCMMTP: identifying and characterizing membrane transport proteins using propensity scores of dipeptides

BackgroundIdentifying putative membrane transport proteins (MTPs) and understanding the transport mechanisms involved remain important challenges for the advancement of structural and functional genomics. However, the transporter characters are mainly acquired from MTP crystal structures which are hard to crystalize. Therefore, it is desirable to develop bioinformatics tools for the effective large-scale analysis of available sequences to identify novel transporters and characterize such transporters.ResultsThis work proposes a novel method (SCMMTP) based on the scoring card method (SCM) using dipeptide composition to identify and characterize MTPs from an existing dataset containing 900 MTPs and 660 non-MTPs which are separated into a training dataset consisting 1,380 proteins and an independent dataset consisting 180 proteins. The SCMMTP produced estimating propensity scores for amino acids and dipeptides as MTPs. The SCMMTP training and test accuracy levels respectively reached 83.81% and 76.11%. The test accuracy of support vector machine (SVM) using a complicated classification method with a low possibility for biological interpretation and position-specific substitution matrix (PSSM) as a protein feature is 80.56%, thus SCMMTP is comparable to SVM-PSSM. To identify MTPs, SCMMTP is applied to three datasets including: 1) human transmembrane proteins, 2) a photosynthetic protein dataset, and 3) a human protein database. MTPs showing α-helix rich structure is agreed with previous studies. The MTPs used residues with low hydration energy. It is hypothesized that, after filtering substrates, the hydrated water molecules need to be released from the pore regions.ConclusionsSCMMTP yields estimating propensity scores for amino acids and dipeptides as MTPs, which can be used to identify novel MTPs and characterize transport mechanisms for use in further experiments.Availabilityhttp://iclab.life.nctu.edu.tw/iclab_webtools/SCMMTP/

[1]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[2]  M. Gromiha,et al.  Classification of transporters using efficient radial basis function networks with position‐specific scoring matrices and biochemical properties , 2010, Proteins.

[3]  Y. Z. Chen,et al.  Prediction of transporter family from protein sequence by support vector machine approach , 2005, Proteins.

[4]  Shinn-Ying Ho,et al.  SCMCRYS: Predicting Protein Crystallization Using an Ensemble Scoring Card Method with Estimating Propensity Scores of P-Collocated Amino Acid Pairs , 2013, PloS one.

[5]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[6]  Ian H. Witten,et al.  Data mining in bioinformatics using Weka , 2004, Bioinform..

[7]  Patrick X. Zhao,et al.  Prediction of Membrane Transport Proteins and Their Substrate Specificities Using Primary Sequence Information , 2014, PloS one.

[8]  P. Koehl,et al.  Structural basis for ion permeation mechanism in pentameric ligand-gated ion channels , 2013, The EMBO journal.

[9]  R. MacKinnon,et al.  Chemistry of ion coordination and hydration revealed by a K+ channel–Fab complex at 2.0 Å resolution , 2001, Nature.

[10]  Shinn-Ying Ho,et al.  SCMHBP: prediction and analysis of heme binding proteins using propensity scores of dipeptides , 2014, BMC Bioinformatics.

[11]  H A Scheraga,et al.  Influence of water on protein structure. An analysis of the preferences of amino acid residues for the inside or outside and for specific conformations in a protein molecule. , 1978, Macromolecules.

[12]  M. Michael Gromiha,et al.  Functional discrimination of membrane proteins using machine learning techniques , 2008, BMC Bioinformatics.

[13]  G. Sciara,et al.  Highlights from recently determined structures of membrane proteins: a focus on channels and transporters. , 2012, Current opinion in structural biology.

[14]  Eric Gouaux,et al.  Principles of activation and permeation in an anion-selective Cys-loop receptor , 2011, Nature.

[15]  Sankalp Jain,et al.  TpPred: A Tool for Hierarchical Prediction of Transport Proteins Using Cluster of Neural Networks and Sequence Derived Features , 2012 .

[16]  Y. Nakamura,et al.  Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. II. Sequence determination of the entire genome and assignment of potential protein-coding regions (supplement). , 1996, DNA research : an international journal for rapid publication of reports on genes and genomes.

[17]  Milton H. Saier,et al.  TCDB: the Transporter Classification Database for membrane transport protein analyses and information , 2005, Nucleic Acids Res..

[18]  C. Deber,et al.  Non-random distribution of amino acids in the transmembrane segments of human type I single span membrane proteins. , 1993, Journal of molecular biology.

[19]  G. Yi,et al.  HMPAS: Human Membrane Protein Analysis System , 2013, Proteome Science.

[20]  Gerhard König,et al.  Absolute hydration free energies of blocked amino acids: implications for protein solvation and stability. , 2013, Biophysical journal.

[21]  F. Gaymard,et al.  Strategies to identify transport systems in plants. , 2001, Trends in plant science.

[22]  D. Fu,et al.  Structure of a glycerol-conducting channel and the basis for its selectivity. , 2000, Science.

[23]  K. Beis,et al.  Functional and Structural Study of the Dimeric Inner Membrane Protein SbmA , 2013, Journal of bacteriology.

[24]  Patrick Xuechun Zhao,et al.  TransportTP: A two-phase classification approach for membrane transporter prediction and characterization , 2009, BMC Bioinformatics.

[25]  Shinn-Ying Ho,et al.  Intelligent evolutionary algorithms for large parameter optimization problems , 2004, IEEE Trans. Evol. Comput..

[26]  Shinn-Ying Ho,et al.  Predicting and analyzing DNA-binding domains using a systematic approach to identifying a set of informative physicochemical and biochemical properties , 2011, BMC Bioinformatics.

[27]  D. Eisenberg Three-dimensional structure of membrane and surface proteins. , 1984, Annual review of biochemistry.

[28]  D. Engelman,et al.  Bacteriorhodopsin is an inside-out protein. , 1980, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Minoru Kanehisa,et al.  AAindex: Amino Acid index database , 2000, Nucleic Acids Res..

[30]  Hui-Ling Huang,et al.  Propensity Scores for Prediction and Characterization of Bioluminescent Proteins from Sequences , 2014, PloS one.

[31]  A. Ravna,et al.  Homology modeling of transporter proteins (carriers and ion channels). , 2012, Methods in molecular biology.

[32]  I. Paulsen,et al.  Large-Scale Comparative Genomic Analyses of Cytoplasmic Membrane Transport Systems in Prokaryotes , 2007, Journal of Molecular Microbiology and Biotechnology.

[33]  Shinn-Ying Ho,et al.  SCMPSP: Prediction and characterization of photosynthetic proteins based on a scoring card method , 2015, BMC Bioinformatics.

[34]  M. Molloy,et al.  Membrane proteins and proteomics: Un amour impossible? , 2000, Electrophoresis.

[35]  Sayaka,et al.  Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. II. Sequence determination of the entire genome and assignment of potential protein-coding regions. , 1996, DNA research : an international journal for rapid publication of reports on genes and genomes.

[36]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[37]  Yoko Watanabe,et al.  Cloning and sequencing of the genes encoding the light-harvesting B806-866 polypeptides and initial studies on the transcriptional organization of puf2B, puf2A and puf2C in Chloroflexus aurantiacus , 1995, Archives of Microbiology.

[38]  S. Buchanan,et al.  Structural biology of membrane proteins , 2006 .

[39]  K Nishikawa,et al.  The amino acid composition is different between the cytoplasmic and extracellular sides in membrane proteins , 1992, FEBS letters.

[40]  Wen-Liang Chen,et al.  Prediction and analysis of protein solubility using a novel scoring card method with dipeptide composition , 2012, BMC Bioinformatics.

[41]  Arne Elofsson,et al.  Why are polar residues within the membrane core evolutionary conserved? , 2011, Proteins.