Support Vector Machine-based method for predicting subcellular localization of mycobacterial proteins using evolutionary information and motifs

BackgroundIn past number of methods have been developed for predicting subcellular location of eukaryotic, prokaryotic (Gram-negative and Gram-positive bacteria) and human proteins but no method has been developed for mycobacterial proteins which may represent repertoire of potent immunogens of this dreaded pathogen. In this study, attempt has been made to develop method for predicting subcellular location of mycobacterial proteins.ResultsThe models were trained and tested on 852 mycobacterial proteins and evaluated using five-fold cross-validation technique. First SVM (Support Vector Machine) model was developed using amino acid composition and overall accuracy of 82.51% was achieved with average accuracy (mean of class-wise accuracy) of 68.47%. In order to utilize evolutionary information, a SVM model was developed using PSSM (Position-Specific Scoring Matrix) profiles obtained from PSI-BLAST (Position-Specific Iterated BLAST) and overall accuracy achieved was of 86.62% with average accuracy of 73.71%. In addition, HMM (Hidden Markov Model), MEME/MAST (Multiple Em for Motif Elicitation/Motif Alignment and Search Tool) and hybrid model that combined two or more models were also developed. We achieved maximum overall accuracy of 86.8% with average accuracy of 89.00% using combination of PSSM based SVM model and MEME/MAST. Performance of our method was compared with that of the existing methods developed for predicting subcellular locations of Gram-positive bacterial proteins.ConclusionA highly accurate method has been developed for predicting subcellular location of mycobacterial proteins. This method also predicts very important class of proteins that is membrane-attached proteins. This method will be useful in annotating newly sequenced or hypothetical mycobacterial proteins. Based on above study, a freely accessible web server TBpred http://www.imtech.res.in/raghava/tbpred/ has been developed.

[1]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[2]  Zhirong Sun,et al.  Support vector machine approach for protein subcellular localization prediction , 2001, Bioinform..

[3]  Xiangjun Liu,et al.  GNBSL: A new integrative system to predict the subcellular location for Gram‐negative bacteria proteins , 2006, Proteomics.

[4]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[5]  J. Trempe Molecular biology of the cell, 3rd edition Bruce Alberts, Dennis Bray, Julian Lewis, Martin Raff, Keith Roberts and James D. Watson, Garland Publishing, 1994, 559.95 (xiii + 1294 pages), ISBN 0-815-31619-4 , 1995, Trends in Endocrinology & Metabolism.

[6]  N. Blom,et al.  Feature-based prediction of non-classical and leaderless protein secretion. , 2004, Protein engineering, design & selection : PEDS.

[7]  Kuo-Chen Chou,et al.  MemType-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM. , 2007, Biochemical and biophysical research communications.

[8]  Jenn-Kang Hwang,et al.  Prediction of protein subcellular localization , 2006, Proteins.

[9]  T. Hubbard,et al.  Using neural networks for prediction of the subcellular location of proteins. , 1998, Nucleic acids research.

[10]  H Nielsen,et al.  Machine learning approaches for the prediction of signal peptides and other protein sorting signals. , 1999, Protein engineering.

[11]  Jian Guo,et al.  TSSub: eukaryotic protein subcellular localization by extracting features from profiles , 2006, Bioinform..

[12]  Gajendra P. S. Raghava,et al.  Prediction of Neurotoxins Based on Their Function and Source , 2007, Silico Biol..

[13]  M. Bhasin,et al.  Support Vector Machine-based Method for Subcellular Localization of Human Proteins Using Amino Acid Compositions, Their Order, and Similarity Search* , 2005, Journal of Biological Chemistry.

[14]  Gajendra P. S. Raghava,et al.  Analysis and prediction of antibacterial peptides , 2007, BMC Bioinformatics.

[15]  Martin Ester,et al.  Sequence analysis PSORTb v . 2 . 0 : Expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis , 2004 .

[16]  Gajendra P. S. Raghava,et al.  GPCRpred: an SVM-based method for prediction of families and subfamilies of G-protein coupled receptors , 2004, Nucleic Acids Res..

[17]  Gajendra P S Raghava,et al.  Prediction of Mitochondrial Proteins Using Support Vector Machine and Hidden Markov Model* , 2006, Journal of Biological Chemistry.

[18]  B. Rost,et al.  Mimicking cellular sorting improves prediction of subcellular localization. , 2005, Journal of molecular biology.

[19]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[20]  I. Behlau,et al.  Identification of Secreted Proteins of Mycobacterium tuberculosis , 1996, Annals of the New York Academy of Sciences.

[21]  M. Wang,et al.  Weighted-support vector machines for predicting membrane protein types based on pseudo-amino acid composition. , 2004, Protein engineering, design & selection : PEDS.

[22]  Kuo-Chen Chou,et al.  Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes , 2005, Bioinform..

[23]  Wing-Kin Sung,et al.  Protein subcellular localization prediction for Gram-negative bacteria using amino acid subalphabets and a combination of multiple support vector machines , 2005, BMC Bioinformatics.

[24]  Kuo-Chen Chou,et al.  Large‐scale plant protein subcellular location prediction , 2007, Journal of cellular biochemistry.

[25]  K. Chou,et al.  Hum-mPLoc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites. , 2007, Biochemical and biophysical research communications.

[26]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[27]  K. Chou,et al.  Euk-mPLoc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites. , 2007, Journal of proteome research.

[28]  Gajendra P. S. Raghava,et al.  ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST , 2004, Nucleic Acids Res..

[29]  Christian V Forst,et al.  Mycobacterium tuberculosis functional network analysis by global subcellular protein profiling. , 2004, Molecular biology of the cell.

[30]  Ke Wang,et al.  PSORT-B: improving protein subcellular localization prediction for Gram-negative bacteria , 2003, Nucleic Acids Res..

[31]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[32]  Michael Gribskov,et al.  Combining evidence using p-values: application to sequence homology searches , 1998, Bioinform..

[33]  Ao Li,et al.  LOCSVMPSI: a web server for subcellular localization of eukaryotic proteins using SVM and profile of PSI-BLAST , 2005, Nucleic Acids Res..

[34]  X. Chen,et al.  SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence , 2003, Nucleic Acids Res..

[35]  K. Chou,et al.  Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization. , 2006, Biochemical and biophysical research communications.

[36]  Jenn-Kang Hwang,et al.  Predicting subcellular localization of proteins for Gram‐negative bacteria by support vector machines based on n‐peptide compositions , 2004, Protein science : a publication of the Protein Society.

[37]  Sadie M. Johnson,et al.  Identification of Secreted Proteins ofMycobacterium tuberculosis by a Bioinformatic Approach , 2000, Infection and Immunity.

[38]  H. Lodish Molecular Cell Biology , 1986 .

[39]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[40]  Gajendra P S Raghava,et al.  Classification of Nuclear Receptors Based on Amino Acid Composition and Dipeptide Composition* , 2004, Journal of Biological Chemistry.

[41]  Piero Fariselli,et al.  BaCelLo: a balanced subcellular localization predictor , 2006, ISMB.

[42]  Gajendra P. S. Raghava,et al.  BTXpred: Prediction of Bacterial Toxins , 2007, Silico Biol..

[43]  Kuo-Chen Chou,et al.  Large-scale predictions of gram-negative bacterial protein subcellular locations. , 2006, Journal of proteome research.

[44]  H.-B. Shen,et al.  Euk-PLoc: an ensemble classifier for large-scale eukaryotic protein subcellular location prediction , 2007, Amino Acids.

[45]  Claudine Médigue,et al.  Re-annotation of the genome sequence of Mycobacterium tuberculosis H37Rv. , 2002, Microbiology.

[46]  K. Chou,et al.  Gpos-PLoc: an ensemble classifier for predicting subcellular localization of Gram-positive bacterial proteins. , 2007, Protein engineering, design & selection : PEDS.

[47]  Gajendra P.S. Raghava,et al.  A hybrid approach for predicting promiscuous MHC class I restricted T cell epitopes , 2007, Journal of Biosciences.

[48]  J. Davies,et al.  Molecular Biology of the Cell , 1983, Bristol Medico-Chirurgical Journal.

[49]  K. Chou,et al.  Virus-PLoc: a fusion classifier for predicting the subcellular localization of viral proteins within host and virus-infected cells. , 2007, Biopolymers.

[50]  Gajendra P. S. Raghava,et al.  PSLpred: prediction of subcellular localization of bacterial proteins , 2005, Bioinform..