MultiLoc: prediction of protein subcellular localization using N-terminal targeting sequences, sequence motifs and amino acid composition

MOTIVATION Functional annotation of unknown proteins is a major goal in proteomics. A key annotation is the prediction of a protein's subcellular localization. Numerous prediction techniques have been developed, typically focusing on a single underlying biological aspect or predicting a subset of all possible localizations. An important step is taken towards emulating the protein sorting process by capturing and bringing together biologically relevant information, and addressing the clear need to improve prediction accuracy and localization coverage. RESULTS Here we present a novel SVM-based approach for predicting subcellular localization, which integrates N-terminal targeting sequences, amino acid composition and protein sequence motifs. We show how this approach improves the prediction based on N-terminal targeting sequences, by comparing our method TargetLoc against existing methods. Furthermore, MultiLoc performs considerably better than comparable methods predicting all major eukaryotic subcellular localizations, and shows better or comparable results to methods that are specialized on fewer localizations or for one organism. AVAILABILITY http://www-bs.informatik.uni-tuebingen.de/Services/MultiLoc/

[1]  B. Rost,et al.  Automatic prediction of protein function , 2003, Cellular and Molecular Life Sciences CMLS.

[2]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[3]  A Dietrich,et al.  A single gene of chloroplast origin codes for mitochondrial and chloroplastic methionyl-tRNA synthetase in Arabidopsis thaliana. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..

[5]  Kuo-Chen Chou,et al.  Nearest neighbour algorithm for predicting protein subcellular location by combining functional domain composition and pseudo-amino acid composition. , 2003, Biochemical and biophysical research communications.

[6]  D. Eisenberg,et al.  Localizing proteins in the cell from their phylogenetic profiles. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[7]  B. Rost,et al.  Finding nuclear localization signals , 2000, EMBO reports.

[8]  M. Gelfand,et al.  Starts of bacterial genes: estimating the reliability of computer predictions. , 1999, Gene.

[9]  A. Helenius,et al.  Intracellular functions of N-linked glycans. , 2001, Science.

[10]  Minoru Kanehisa,et al.  Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs , 2003, Bioinform..

[11]  R. Herrmann,et al.  Protein import into chloroplasts. The hydrophilic lumenal proteins exhibit unexpected import and sorting specificities in spite of structurally conserved transit peptides. , 1993, The Journal of biological chemistry.

[12]  A. Bairoch,et al.  The SWISS-PROT protein sequence data bank. , 1991, Nucleic acids research.

[13]  Satoru Miyano,et al.  Extensive feature detection of N-terminal protein sorting signals , 2002, Bioinform..

[14]  Kuo-Chen Chou,et al.  Predicting 22 protein localizations in budding yeast. , 2004, Biochemical and biophysical research communications.

[15]  K. Nakai,et al.  PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. , 1999, Trends in biochemical sciences.

[16]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[17]  J. Rothman,et al.  Biosynthetic protein transport and sorting by the endoplasmic reticulum and Golgi. , 1987, Annual review of biochemistry.

[18]  Gajendra P. S. Raghava,et al.  PSLpred: prediction of subcellular localization of bacterial proteins , 2005, Bioinform..

[19]  K. Chou,et al.  Using Functional Domain Composition and Support Vector Machines for Prediction of Protein Subcellular Location* , 2002, The Journal of Biological Chemistry.

[20]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[21]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[22]  D A Kendall,et al.  Protein transport via amino-terminal targeting sequences: common themes in diverse systems. , 1995, Molecular membrane biology.

[23]  Y D Cai,et al.  Using neural networks for prediction of domain structural classes. , 2000, Biochimica et biophysica acta.

[24]  Michelle S. Scott,et al.  Predicting subcellular localization via protein motif co-occurrence. , 2004, Genome research.

[25]  Burkhard Rost,et al.  NLSdb: database of nuclear localization signals , 2003, Nucleic Acids Res..

[26]  A. Bairoch,et al.  PROSITE: recent developments. , 1994, Nucleic acids research.

[27]  Zheng Yuan Prediction of protein subcellular locations using Markov chain models , 1999, FEBS letters.

[28]  P. Mullineaux,et al.  Simultaneous targeting of pea glutathione reductase and of a bacterial fusion protein to chloroplasts and mitochondria in transgenic tobacco. , 1995, The Plant journal : for cell and molecular biology.

[29]  S. Brunak,et al.  Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. , 2000, Journal of molecular biology.

[30]  B. Rost,et al.  Adaptation of protein surfaces to subcellular location. , 1998, Journal of molecular biology.

[31]  T. Hubbard,et al.  Using neural networks for prediction of the subcellular location of proteins. , 1998, Nucleic acids research.

[32]  Zhirong Sun,et al.  Support vector machine approach for protein subcellular localization prediction , 2001, Bioinform..

[33]  Kuo-Chen Chou,et al.  Predicting protein localization in budding Yeast , 2005, Bioinform..

[34]  B. Rost,et al.  Mimicking cellular sorting improves prediction of subcellular localization. , 2005, Journal of molecular biology.

[35]  M. Bhasin,et al.  Support Vector Machine-based Method for Subcellular Localization of Human Proteins Using Amino Acid Compositions, Their Order, and Similarity Search* , 2005, Journal of Biological Chemistry.

[36]  Enrico Schleiff,et al.  Protein import into chloroplasts , 2004, Nature Reviews Molecular Cell Biology.

[37]  M. Kanehisa,et al.  A knowledge base for predicting protein localization sites in eukaryotic cells , 1992, Genomics.

[38]  Gajendra P. S. Raghava,et al.  ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST , 2004, Nucleic Acids Res..

[39]  Kuo-Chen Chou,et al.  Predicting subcellular localization of proteins by hybridizing functional domain composition and pseudo‐amino acid composition , 2004, Journal of cellular biochemistry.

[40]  Ying Huang,et al.  Prediction of protein subcellular locations using fuzzy k-NN method , 2004, Bioinform..

[41]  Kuo-Chen Chou,et al.  Predicting subcellular localization of proteins in a hybridization space , 2004, Bioinform..

[42]  Pierre Dönnes,et al.  Predicting Protein Subcellular Localization: Past, Present, and Future , 2004, Genomics, proteomics & bioinformatics.