Discrimination of membrane transporter protein types using K-nearest neighbor method derived from the similarity distance of total diversity measure.

Membrane transporters play crucial roles in the fundamental cellular processes of living organisms. Computational techniques are very necessary to annotate the transporter functions. In this study, a multi-class K nearest neighbor classifier based on the increment of diversity (KNN-ID) was developed to discriminate the membrane transporter types when the increment of diversity (ID) was introduced as one of the novel similarity distances. Comparisons with multiple recently published methods showed that the proposed KNN-ID method outperformed the other methods, obtaining more than 20% improvement for overall accuracy. The overall prediction accuracy reached was 83.1%, when the K was selected as 2. The prediction sensitivity achieved 76.7%, 89.1%, 80.1% for channels/pores, electrochemical potential-driven transporters, primary active transporters, respectively. Discrimination and comparison between any two different classes of transporters further demonstrated that the proposed method is a potential classifier and will play a complementary role for facilitating the functional assignment of transporters.

[1]  Patrick Xuechun Zhao,et al.  A nearest neighbor approach for automated transporter prediction and categorization from protein sequences , 2008, Bioinform..

[2]  Li Yang,et al.  Using auto covariance method for functional discrimination of membrane proteins based on evolution information , 2009, Amino Acids.

[3]  Patrick Xuechun Zhao,et al.  TransportTP: A two-phase classification approach for membrane transporter prediction and characterization , 2009, BMC Bioinformatics.

[4]  C E Shannon,et al.  The mathematical theory of communication. 1963. , 1997, M.D. computing : computers in medical practice.

[5]  Wei Chen,et al.  Predicting the subcellular localization of mycobacterial proteins by incorporating the optimal tripeptides into the general form of pseudo amino acid composition. , 2015, Molecular bioSystems.

[6]  Xiaolong Wang,et al.  Identification of DNA-binding proteins by incorporating evolutionary information into pseudo amino acid composition via the top-n-gram approach , 2015, Journal of biomolecular structure & dynamics.

[7]  Milton H. Saier,et al.  The Transporter Classification Database , 2013, Nucleic Acids Res..

[8]  Patrick X. Zhao,et al.  Prediction of Membrane Transport Proteins and Their Substrate Specificities Using Primary Sequence Information , 2014, PloS one.

[9]  Hui Ding,et al.  AcalPred: A Sequence-Based Tool for Discriminating between Acidic and Alkaline Enzymes , 2013, PloS one.

[10]  K. Chou,et al.  iSNO-AAPair: incorporating amino acid pairwise coupling into PseAAC for predicting cysteine S-nitrosylation sites in proteins , 2013, PeerJ.

[11]  Wenjian Liu,et al.  Comprehensive ab initio calculation and simulation on the low‐lying electronic states of TlX (X = F, Cl, Br, I, and At) , 2009, J. Comput. Chem..

[12]  Yu-Yen Ou,et al.  Prediction of transporter targets using efficient RBF networks with PSSM profiles and biochemical properties , 2011, Bioinform..

[13]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[14]  Dong-Sheng Cao,et al.  propy: a tool to generate various modes of Chou's PseAAC , 2013, Bioinform..

[15]  Y. Z. Chen,et al.  Prediction of transporter family from protein sequence by support vector machine approach , 2005, Proteins.

[16]  Ying Huang,et al.  Membrane transporters and channels in chemoresistance and -sensitivity of tumor cells. , 2006, Cancer letters.

[17]  Nitish Kumar Mishra,et al.  Identification of Mannose Interacting Residues Using Local Composition , 2011, PloS one.

[18]  B. Cammue,et al.  The mode of antifungal action of plant, insect and human defensins , 2008, Cellular and Molecular Life Sciences.

[19]  Qian-zhong Li,et al.  Using reduced amino acid composition to predict defensin family and subfamily: Integrating similarity measure and structural alphabet , 2009, Peptides.

[20]  Wei Chen,et al.  Identification of bacteriophage virion proteins by the ANOVA feature selection and analysis. , 2014, Molecular bioSystems.

[21]  K. Chou Using subsite coupling to predict signal peptides. , 2001, Protein engineering.

[22]  K. Chou,et al.  iCDI-PseFpt: identify the channel-drug interaction in cellular networking with PseAAC and molecular fingerprints. , 2013, Journal of theoretical biology.

[23]  R. Laxton The measure of diversity. , 1978, Journal of theoretical biology.

[24]  B. Liu,et al.  iDNA-Prot|dis: Identifying DNA-Binding Proteins by Incorporating Amino Acid Distance-Pairs and Reduced Alphabet Profile into the General Pseudo Amino Acid Composition , 2014, PloS one.

[25]  Wei Chen,et al.  iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition , 2013, Nucleic acids research.

[26]  K. Chou,et al.  iNitro-Tyr: Prediction of Nitrotyrosine Sites in Proteins with General Pseudo Amino Acid Composition , 2014, PloS one.

[27]  Dongsheng Zou,et al.  β‐Hairpin prediction with quadratic discriminant analysis using diversity measure , 2009, J. Comput. Chem..

[28]  K. Chou,et al.  iMethyl-PseAAC: Identification of Protein Methylation Sites via a Pseudo Amino Acid Composition Approach , 2014, BioMed research international.

[29]  K. Chou,et al.  iSS-PseDNC: Identifying Splicing Sites Using Pseudo Dinucleotide Composition , 2014, BioMed research international.

[30]  K. Chou,et al.  iLoc-Animal: a multi-label learning classifier for predicting subcellular localization of animal proteins. , 2013, Molecular bioSystems.

[31]  Wei Chen,et al.  A similarity distance of diversity measure for discriminating mesophilic and thermophilic proteins , 2013, Amino Acids.

[32]  M. Gromiha,et al.  Classification of transporters using efficient radial basis function networks with position‐specific scoring matrices and biochemical properties , 2010, Proteins.

[33]  K. Chou,et al.  iSNO-PseAAC: Predict Cysteine S-Nitrosylation Sites in Proteins by Incorporating Position Specific Amino Acid Propensity into Pseudo Amino Acid Composition , 2013, PloS one.

[34]  K. Chou,et al.  iLoc-Gpos: a multi-layer classifier for predicting the subcellular localization of singleplex and multiplex Gram-positive bacterial proteins. , 2012, Protein and peptide letters.

[35]  M. Michael Gromiha,et al.  Functional discrimination of membrane proteins using machine learning techniques , 2008, BMC Bioinformatics.

[36]  Wei Chen,et al.  iTIS-PseTNC: a sequence-based predictor for identifying translation initiation site in human genes using pseudo trinucleotide composition. , 2014, Analytical biochemistry.

[37]  Yu-Yen Ou,et al.  Bioinformatics approaches for functional annotation of membrane proteins , 2014, Briefings Bioinform..

[38]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[39]  Xinbin Dai,et al.  Genomic Inventory and Transcriptional Analysis of Medicago truncatula Transporters1[W][OA] , 2009, Plant Physiology.

[40]  K. Chou,et al.  Predicting Anatomical Therapeutic Chemical (ATC) Classification of Drugs by Integrating Chemical-Chemical Interactions and Similarities , 2012, PloS one.

[41]  Ian T. Paulsen,et al.  TransportDB: a comprehensive database resource for cytoplasmic membrane transport systems and outer membrane channels , 2006, Nucleic Acids Res..

[42]  M. Saier A Functional-Phylogenetic Classification System for Transmembrane Solute Transporters , 2000, Microbiology and Molecular Biology Reviews.

[43]  M. Hediger,et al.  Structure, function and evolution of solute transporters in prokaryotes and eukaryotes. , 1994, The Journal of experimental biology.

[44]  Qianzhong Li,et al.  Prediction of presynaptic and postsynaptic neurotoxins by the increment of diversity. , 2009, Toxicology in vitro : an international journal published in association with BIBRA.

[45]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[46]  Qian-zhong Li,et al.  Using K-minimum increment of diversity to predict secretory proteins of malaria parasite based on groupings of amino acids , 2010, Amino Acids.

[47]  K. Chou,et al.  iLoc-Virus: a multi-label learning classifier for identifying the subcellular localization of virus proteins with both single and multiple sites. , 2011, Journal of theoretical biology.

[48]  K. Chou,et al.  iAMP-2L: a two-level multi-label classifier for identifying antimicrobial peptides and their functional types. , 2013, Analytical biochemistry.

[49]  K. Chou,et al.  iLoc-Plant: a multi-label classifier for predicting the subcellular localization of plant proteins with both single and multiple sites. , 2011, Molecular bioSystems.

[50]  Kuo-Chen Chou,et al.  Some remarks on predicting multi-label attributes in molecular biosystems. , 2013, Molecular bioSystems.