PredSL: A Tool for the N-terminal Sequence-based Prediction of Protein Subcellular Localization

The ability to predict the subcellular localization of a protein from its sequence is of great importance, as it provides information about the protein’s function. We present a computational tool, PredSL, which utilizes neural networks, Markov chains, profile hidden Markov models, and scoring matrices for the prediction of the subcellular localization of proteins in eukaryotic cells from the N-terminal amino acid sequence. It aims to classify proteins into five groups: chloroplast, thylakoid, mitochondrion, secretory pathway, and “other”. When tested in a five-fold cross-validation procedure, PredSL demonstrates 86.7% and 87.1% overall accuracy for the plant and non-plant datasets, respectively. Compared with TargetP, which is the most widely used method to date, and LumenP, the results of PredSL are comparable in most cases. When tested on the experimentally verified proteins of the Saccharomyces cerevisiae genome, PredSL performs comparably if not better than any available algorithm for the same task. Furthermore, PredSL is the only method capable for the prediction of these subcellular localizations that is available as a stand-alone application through the URL: http://bioinformatics.biol.uoa.gr/PredSL/.

[1]  G. Heijne,et al.  ChloroP, a neural network‐based method for predicting chloroplast transit peptides and their cleavage sites , 1999, Protein science : a publication of the Protein Society.

[2]  Eoin Fahy,et al.  MITOPRED: a genome-scale method for prediction of nucleus-encoded mitochondrial proteins , 2004, Bioinform..

[3]  Adam Godzik,et al.  Tolerating some redundancy significantly speeds up clustering of large protein databases , 2002, Bioinform..

[4]  Zemin Zhang,et al.  A profile hidden Markov model for signal peptides generated by HMMER , 2003, Bioinform..

[5]  K. Chou,et al.  Prediction of membrane protein types and subcellular locations , 1999, Proteins.

[6]  P. Aloy,et al.  Relation between amino acid composition and cellular location of proteins. , 1997, Journal of molecular biology.

[7]  S. Brunak,et al.  Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. , 2000, Journal of molecular biology.

[8]  B. Rost,et al.  Better prediction of sub‐cellular localization by combining evolutionary and structural information , 2003, Proteins.

[9]  Zhiyong Lu,et al.  Proteome Analyst: custom predictions with explanations in a web-based tool for high-throughput proteome annotations , 2004, Nucleic Acids Res..

[10]  Dieter Jahn,et al.  PrediSi: prediction of signal peptides and their cleavage positions , 2004, Nucleic Acids Res..

[11]  E. O’Shea,et al.  Global analysis of protein localization in budding yeast , 2003, Nature.

[12]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[13]  Zhi-Ping Feng,et al.  An overview on predicting the subcellular location of a protein , 2002, Silico Biol..

[14]  G. Blobel Protein Targeting (Nobel Lecture) , 2000, Chembiochem : a European journal of chemical biology.

[15]  M. Gerstein,et al.  Subcellular localization of the yeast proteome. , 2002, Genes & development.

[16]  M. Andersson,et al.  A chloroplast-localized vesicular transport system: a bio-informatics approach , 2004, BMC Genomics.

[17]  Satoru Miyano,et al.  Extensive feature detection of N-terminal protein sorting signals , 2002, Bioinform..

[18]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[19]  Anders Krogh,et al.  Prediction of Signal Peptides and Signal Anchors by a Hidden Markov Model , 1998, ISMB.

[20]  Minoru Kanehisa,et al.  Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs , 2003, Bioinform..

[21]  Gajendra P S Raghava,et al.  Prediction of Mitochondrial Proteins Using Support Vector Machine and Hidden Markov Model* , 2006, Journal of Biological Chemistry.

[22]  K. Chou,et al.  Prediction of protein subcellular locations by incorporating quasi-sequence-order effect. , 2000, Biochemical and biophysical research communications.

[23]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[24]  O. Emanuelsson,et al.  Analysis of Curated and Predicted Plastid Subproteomes of Arabidopsis. Subcellular Compartmentalization Leads to Distinctive Proteome Properties1[w] , 2004, Plant Physiology.

[25]  Søren Brunak,et al.  A Neural Network Method for Identification of Prokaryotic and Eukaryotic Signal Peptides and Prediction of their Cleavage Sites , 1997, Int. J. Neural Syst..

[26]  Peer Bork,et al.  Predicting protein cellular localization using a domain projection method. , 2002, Genome research.

[27]  John Hawkins,et al.  Prediction of subcellular localization using sequence-biased recurrent networks , 2005, Bioinform..

[28]  Olof Emanuelsson,et al.  LumenP—A neural network predictor for protein localization in the thylakoid lumen , 2003, Protein science : a publication of the Protein Society.