ESLpred2: improved method for predicting subcellular localization of eukaryotic proteins

BackgroundThe expansion of raw protein sequence databases in the post genomic era and availability of fresh annotated sequences for major localizations particularly motivated us to introduce a new improved version of our previously forged eukaryotic subcellular localizations prediction method namely "ESLpred". Since, subcellular localization of a protein offers essential clues about its functioning, hence, availability of localization predictor would definitely aid and expedite the protein deciphering studies. However, robustness of a predictor is highly dependent on the superiority of dataset and extracted protein attributes; hence, it becomes imperative to improve the performance of presently available method using latest dataset and crucial input features.ResultsHere, we describe augmentation in the prediction performance obtained for our most popular ESLpred method using new crucial features as an input to Support Vector Machine (SVM). In addition, recently available, highly non-redundant dataset encompassing three kingdoms specific protein sequence sets; 1198 fungi sequences, 2597 from animal and 491 plant sequences were also included in the present study. First, using the evolutionary information in the form of profile composition along with whole and N-terminal sequence composition as an input feature vector of 440 dimensions, overall accuracies of 72.7, 75.8 and 74.5% were achieved respectively after five-fold cross-validation. Further, enhancement in performance was observed when similarity search based results were coupled with whole and N-terminal sequence composition along with profile composition by yielding overall accuracies of 75.9, 80.8, 76.6% respectively; best accuracies reported till date on the same datasets.ConclusionThese results provide confidence about the reliability and accurate prediction of SVM modules generated in the present study using sequence and profile compositions along with similarity search based results. The presently developed modules are implemented as web server "ESLpred2" available at http://www.imtech.res.in/raghava/eslpred2/.

[1]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[2]  Piero Fariselli,et al.  BaCelLo: a balanced subcellular localization predictor , 2006, ISMB.

[3]  Gajendra P. S. Raghava,et al.  PSLpred: prediction of subcellular localization of bacterial proteins , 2005, Bioinform..

[4]  Shiow-Fen Hwang,et al.  ProLoc-GO: Utilizing informative Gene Ontology terms for sequence-based prediction of protein subcellular localization , 2008, BMC Bioinformatics.

[5]  Tianzi Jiang,et al.  Esub8: A novel tool to predict protein subcellular localizations in eukaryotic organisms , 2004, BMC Bioinformatics.

[6]  Ke Wang,et al.  PSORT-B: improving protein subcellular localization prediction for Gram-negative bacteria , 2003, Nucleic Acids Res..

[7]  B. Rost,et al.  Mimicking cellular sorting improves prediction of subcellular localization. , 2005, Journal of molecular biology.

[8]  Kuo-Bin Li,et al.  AAIndexLoc: predicting subcellular localization of proteins based on a new representation of sequences using amino acid indices , 2008, Amino Acids.

[9]  Gajendra Pal Singh Raghava,et al.  Prediction of β‐turns in proteins from multiple alignment using neural network , 2003, Protein science : a publication of the Protein Society.

[10]  Jean-Philippe Vert,et al.  A novel representation of protein sequences for prediction of subcellular location using support vector machines , 2005, Protein science : a publication of the Protein Society.

[11]  K. Nakai,et al.  PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. , 1999, Trends in biochemical sciences.

[12]  S. Brunak,et al.  Improved prediction of signal peptides: SignalP 3.0. , 2004, Journal of molecular biology.

[13]  Ao Li,et al.  LOCSVMPSI: a web server for subcellular localization of eukaryotic proteins using SVM and profile of PSI-BLAST , 2005, Nucleic Acids Res..

[14]  Harpreet Kaur,et al.  Real value prediction of solvent accessibility in proteins using multiple sequence alignment and secondary structure , 2005, Proteins.

[15]  Minoru Kanehisa,et al.  Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs , 2003, Bioinform..

[16]  Arun Krishnan,et al.  pSLIP: SVM based protein subcellular localization prediction using multiple physicochemical properties , 2005, BMC Bioinformatics.

[17]  Ying Huang,et al.  Prediction of protein subcellular locations using fuzzy k-NN method , 2004, Bioinform..

[18]  Burkhard Rost,et al.  LOCnet and LOCtarget: sub-cellular localization for structural genomics targets , 2004, Nucleic Acids Res..

[19]  Minoru Kanehisa,et al.  PLOC: Prediction of Subcellular Location of Proteins , 2003 .

[20]  M. Bhasin,et al.  Support Vector Machine-based Method for Subcellular Localization of Human Proteins Using Amino Acid Compositions, Their Order, and Similarity Search* , 2005, Journal of Biological Chemistry.

[21]  Jian Guo,et al.  TSSub: eukaryotic protein subcellular localization by extracting features from profiles , 2006, Bioinform..

[22]  K. Chou,et al.  Prediction of protein subcellular locations by GO-FunD-PseAA predictor. , 2004, Biochemical and biophysical research communications.

[23]  Oliver Kohlbacher,et al.  MultiLoc: prediction of protein subcellular localization using N-terminal targeting sequences, sequence motifs and amino acid composition , 2006, Bioinform..

[24]  S. Brunak,et al.  Locating proteins in the cell using TargetP, SignalP and related tools , 2007, Nature Protocols.

[25]  Gajendra P. S. Raghava,et al.  ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST , 2004, Nucleic Acids Res..

[26]  Chaoyang Zhang,et al.  Supervised learning method for the prediction of subcellular localization of proteins using amino acid and amino acid pair composition , 2008, BMC Genomics.

[27]  M. Kanehisa,et al.  A knowledge base for predicting protein localization sites in eukaryotic cells , 1992, Genomics.

[28]  Zhiyong Lu,et al.  Proteome Analyst: custom predictions with explanations in a web-based tool for high-throughput proteome annotations , 2004, Nucleic Acids Res..

[29]  S. Brunak,et al.  Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. , 2000, Journal of molecular biology.

[30]  K. Chou,et al.  Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms , 2008, Nature Protocols.

[31]  T. Hubbard,et al.  Using neural networks for prediction of the subcellular location of proteins. , 1998, Nucleic acids research.

[32]  Zhirong Sun,et al.  Support vector machine approach for protein subcellular localization prediction , 2001, Bioinform..

[33]  Jenn-Kang Hwang,et al.  Predicting subcellular localization of proteins for Gram‐negative bacteria by support vector machines based on n‐peptide compositions , 2004, Protein science : a publication of the Protein Society.