BaCelLo: a balanced subcellular localization predictor

MOTIVATION The knowledge of the subcellular localization of a protein is fundamental for elucidating its function. It is difficult to determine the subcellular location for eukaryotic cells with experimental high-throughput procedures. Computational procedures are then needed for annotating the subcellular location of proteins in large scale genomic projects. RESULTS BaCelLo is a predictor for five classes of subcellular localization (secretory pathway, cytoplasm, nucleus, mitochondrion and chloroplast) and it is based on different SVMs organized in a decision tree. The system exploits the information derived from the residue sequence and from the evolutionary information contained in alignment profiles. It analyzes the whole sequence composition and the compositions of both the N- and C-termini. The training set is curated in order to avoid redundancy. For the first time a balancing procedure is introduced in order to mitigate the effect of biased training sets. Three kingdom-specific predictors are implemented: for animals, plants and fungi, respectively. When distributing the proteins from animals and fungi into four classes, accuracy of BaCelLo reach 74% and 76%, respectively; a score of 67% is obtained when proteins from plants are distributed into five classes. BaCelLo outperforms the other presently available methods for the same task and gives more balanced accuracy and coverage values for each class. We also predict the subcellular localization of five whole proteomes, Homo sapiens, Mus musculus, Caenorhabditis elegans, Saccharomyces cerevisiae and Arabidopsis thaliana, comparing the protein content in each different compartment. AVAILABILITY BaCelLo can be accessed at http://www.biocomp.unibo.it/bacello/.

[1]  D. Eisenberg,et al.  Localizing proteins in the cell from their phylogenetic profiles. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[2]  S. Brunak,et al.  SHORT COMMUNICATION Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites , 1997 .

[3]  Graydon B. Gonsalvez,et al.  RNA localization in yeast: moving towards a mechanism , 2005, Biology of the cell.

[4]  U. Kutay,et al.  Nucleocytoplasmic transport: taking an inventory , 2003, Cellular and Molecular Life Sciences CMLS.

[5]  Sophie Malcomber,et al.  A C-terminal targeting signal controls differential compartmentalisation of Caenorhabditis elegans host cell factor (HCF) to the nucleus or mitochondria. , 2003, European Journal of Cell Biology.

[6]  N. Blom,et al.  Feature-based prediction of non-classical and leaderless protein secretion. , 2004, Protein engineering, design & selection : PEDS.

[7]  W. Neupert,et al.  The DNA Helicase, Hmi1p, Is Transported into Mitochondria by a C-terminal Cleavable Targeting Signal* , 1999, The Journal of Biological Chemistry.

[8]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt) , 2006, Nucleic Acids Research.

[9]  M. Wang,et al.  Weighted-support vector machines for predicting membrane protein types based on pseudo-amino acid composition. , 2004, Protein engineering, design & selection : PEDS.

[10]  F. Legeai,et al.  Predotar: A tool for rapidly screening proteomes for N‐terminal targeting sequences , 2004, Proteomics.

[11]  S. Brunak,et al.  Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. , 2000, Journal of molecular biology.

[12]  T. Hubbard,et al.  Using neural networks for prediction of the subcellular location of proteins. , 1998, Nucleic acids research.

[13]  Piero Fariselli,et al.  An ENSEMBLE machine learning approach for the prediction of all-alpha membrane proteins , 2003, ISMB.

[14]  E. O’Shea,et al.  Global analysis of protein localization in budding yeast , 2003, Nature.

[15]  K. Nakai,et al.  PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. , 1999, Trends in biochemical sciences.

[16]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[17]  Chittibabu Guda,et al.  TARGET: a new method for predicting protein subcellular localization in eukaryotes , 2005, Bioinform..

[18]  Jungwon Yoon,et al.  The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community , 2003, Nucleic Acids Res..

[19]  Gajendra P. S. Raghava,et al.  ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST , 2004, Nucleic Acids Res..

[20]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[21]  Michelle S. Scott,et al.  Predicting subcellular localization via protein motif co-occurrence. , 2004, Genome research.

[22]  Piero Fariselli,et al.  SPEPlip: the detection of signal peptide and lipoprotein cleavage sites , 2003, Bioinform..

[23]  Chittibabu Guda,et al.  Erratum: pTARGET: A new method for predicting protein subcellular localization in eukaryotes (Bioinformatics) vol. 21(21) (3963-3969)) , 2005 .

[24]  K. Sjölander,et al.  The Arabidopsis thaliana Chloroplast Proteome Reveals Pathway Abundance and Novel Protein Functions , 2004, Current Biology.

[25]  Ao Li,et al.  LOCSVMPSI: a web server for subcellular localization of eukaryotic proteins using SVM and profile of PSI-BLAST , 2005, Nucleic Acids Res..

[26]  Youichirou Higashi,et al.  Mitochondrial targeting sequence of the influenza A virus PB1‐F2 protein and its function in mitochondria , 2004, FEBS letters.

[27]  John Hawkins,et al.  Prediction of subcellular localization using sequence-biased recurrent networks , 2005, Bioinform..

[28]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[29]  Zhirong Sun,et al.  Support vector machine approach for protein subcellular localization prediction , 2001, Bioinform..

[30]  Jean-Philippe Vert,et al.  A novel representation of protein sequences for prediction of subcellular location using support vector machines , 2005, Protein science : a publication of the Protein Society.

[31]  W. Nickel The mystery of nonclassical protein secretion. A current view on cargo proteins and potential export routes. , 2003, European journal of biochemistry.

[32]  B. Rost,et al.  Mimicking cellular sorting improves prediction of subcellular localization. , 2005, Journal of molecular biology.