Significantly Improved Prediction of Subcellular Localization by Integrating Text and Protein Sequence Data

Computational prediction of protein subcellular localization is a challenging problem. Several approaches have been presented during the past few years; some attempt to cover a wide variety of localizations, while others focus on a small number of localizations and on specific organisms. We present a comprehensive system, integrating protein sequence-derived data and text-based information. Itis tested on three large data sets, previously used by leading prediction methods. The results demonstrate that our system performs significantly better than previously reported results, for a wide range of eukaryotic subcellular localizations.

[1]  Paul Horton,et al.  A Probabilistic Classification System for Predicting the Cellular Localization Sites of Proteins , 1996, ISMB.

[2]  Burkhard Rost,et al.  NLSdb: database of nuclear localization signals , 2003, Nucleic Acids Res..

[3]  S. Brunak,et al.  Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. , 2000, Journal of molecular biology.

[4]  A. Bairoch,et al.  PROSITE: recent developments. , 1994, Nucleic acids research.

[5]  Ke Wang,et al.  PSORT-B: improving protein subcellular localization prediction for Gram-negative bacteria , 2003, Nucleic Acids Res..

[6]  Minoru Kanehisa,et al.  Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs , 2003, Bioinform..

[7]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..

[8]  Paul Horton,et al.  Better Prediction of Protein Cellular Localization Sites with the it k Nearest Neighbors Classifier , 1997, ISMB.

[9]  Kuo-Chen Chou,et al.  Predicting 22 protein localizations in budding yeast. , 2004, Biochemical and biophysical research communications.

[10]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[11]  B. Rost,et al.  Finding nuclear localization signals , 2000, EMBO reports.

[12]  Michael J. E. Sternberg,et al.  Predicting the Sub-Cellular Location of Proteins from Text Using Support Vector Machines , 2001, Pacific Symposium on Biocomputing.

[13]  Oliver Kohlbacher,et al.  Using N-terminal targeting sequences, amino acid composition, and sequence motifs for predicting protein subcellular localizations , 2005, German Conference on Bioinformatics.

[14]  Burkhard Rost,et al.  Inferring sub-cellular localization through automated lexical analysis , 2002, ISMB.

[15]  G. Schneider,et al.  Advances in the prediction of protein targeting signals , 2004, Proteomics.

[16]  M. Kanehisa,et al.  Expert system for predicting protein localization sites in gram‐negative bacteria , 1991, Proteins.

[17]  Chih-Jen Lin,et al.  Probability Estimates for Multi-class Classification by Pairwise Coupling , 2003, J. Mach. Learn. Res..

[18]  G. Heijne,et al.  ChloroP, a neural network‐based method for predicting chloroplast transit peptides and their cleavage sites , 1999, Protein science : a publication of the Protein Society.

[19]  Satoru Miyano,et al.  Extensive feature detection of N-terminal protein sorting signals , 2002, Bioinform..

[20]  P. Ross-Macdonald,et al.  Large-scale analysis of gene expression, protein localization, and gene disruption in Saccharomyces cerevisiae. , 1994, Genes & development.

[21]  Eugene Agichtein,et al.  Combining Text Mining and Sequence Analysis to Discover Protein Functional Regions , 2003, Pacific Symposium on Biocomputing.

[22]  B. Rost,et al.  Mimicking cellular sorting improves prediction of subcellular localization. , 2005, Journal of molecular biology.

[23]  Rod B. Watson,et al.  Localization of Organelle Proteins by Isotope Tagging (LOPIT)*S , 2004, Molecular & Cellular Proteomics.

[24]  Pierre Dönnes,et al.  Predicting Protein Subcellular Localization: Past, Present, and Future , 2004, Genomics, proteomics & bioinformatics.

[25]  M. Hanson,et al.  GFP imaging: methodology and application to investigate cellular compartmentation in plants. , 2001, Journal of experimental botany.

[26]  M. Kanehisa,et al.  A knowledge base for predicting protein localization sites in eukaryotic cells , 1992, Genomics.