Predicting subcellular localization of proteins based on their N-terminal amino acid sequence.

A neural network-based tool, TargetP, for large-scale subcellular location prediction of newly identified proteins has been developed. Using N-terminal sequence information only, it discriminates between proteins destined for the mitochondrion, the chloroplast, the secretory pathway, and "other" localizations with a success rate of 85% (plant) or 90% (non-plant) on redundancy-reduced test sets. From a TargetP analysis of the recently sequenced Arabidopsis thaliana chromosomes 2 and 4 and the Ensembl Homo sapiens protein set, we estimate that 10% of all plant proteins are mitochondrial and 14% chloroplastic, and that the abundance of secretory proteins, in both Arabidopsis and Homo, is around 10%. TargetP also predicts cleavage sites with levels of correctly predicted sites ranging from approximately 40% to 50% (chloroplastic and mitochondrial presequences) to above 70% (secretory signal peptides). TargetP is available as a web-server at http://www.cbs.dtu.dk/services/TargetP/.

[1]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .

[2]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[3]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[4]  T. Kohonen Self-organized formation of topographically correct feature maps , 1982 .

[5]  S. Gasser,et al.  Imported mitochondrial proteins cytochrome b2 and cytochrome c1 are processed in two steps. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[6]  G von Heijne,et al.  Patterns of amino acids near signal-sequence cleavage sites. , 1983, European journal of biochemistry.

[7]  R. Ellis,et al.  Transport of proteins into chloroplasts. Partial purification of a chloroplast protease involved in the processing of important precursor polypeptides. , 1984, European journal of biochemistry.

[8]  G von Heijne,et al.  Signal sequences. The limits of variation. , 1985, Journal of molecular biology.

[9]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[10]  R. Ellis,et al.  The Transport of Proteins into Chloroplasts , 1986 .

[11]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[12]  D. Blank,et al.  Transport of proteins to the mitochondrial intermembrane space: the ‘matrix‐targeting’ and the ‘sorting’ domains in the cytochrome c1 presequence. , 1987, The EMBO journal.

[13]  J. Hendrick,et al.  Two mitochondrial matrix proteases act sequentially in the processing of mammalian matrix enzymes. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[14]  G. von Heijne,et al.  Mitochondrial targeting sequences why ‘non‐amphiphilic’ peptides may still be amphiphilic , 1988, FEBS letters.

[15]  T. Sejnowski,et al.  Predicting the secondary structure of globular proteins using neural network models. , 1988, Journal of molecular biology.

[16]  F. Hartl,et al.  Mitochondrial protein import: Identification of processing peptidase and of PEP, a processing enhancing protein , 1988, Cell.

[17]  G. von Heijne,et al.  Domain structure of mitochondrial and chloroplast targeting peptides. , 1989, European journal of biochemistry.

[18]  S. Karlin,et al.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[19]  G. von Heijne,et al.  A conserved cleavage‐site motif in chloroplast transit peptides , 1990, FEBS letters.

[20]  G. von Heijne The signal peptide. , 1990, The Journal of membrane biology.

[21]  G. von Heijne,et al.  Chloroplast transit peptides from the green alga Chlamydomonas reinhardtii share features with both mitochondrial and higher plant chloroplast presequences , 1990, FEBS letters.

[22]  G. Vonheijne The signal peptide. , 1990 .

[23]  W. Pearson Rapid and sensitive sequence comparison with FASTP and FASTA. , 1990, Methods in enzymology.

[24]  S. Knudsen,et al.  Prediction of human mRNA donor and acceptor sites from the DNA sequence. , 1991, Journal of molecular biology.

[25]  W. Neupert,et al.  Processing of mitochondrial precursor proteins. , 1991, Biomedica biochimica acta.

[26]  M. Kanehisa,et al.  A knowledge base for predicting protein localization sites in eukaryotic cells , 1992, Genomics.

[27]  U. Hobohm,et al.  Selection of representative protein data sets , 1992, Protein science : a publication of the Protein Society.

[28]  T. Rapoport Transport of proteins across the endoplasmic reticulum membrane. , 1992, Science.

[29]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[30]  S. Altschul,et al.  Issues in searching molecular sequence databases , 1994, Nature Genetics.

[31]  F. Kalousek,et al.  Mitochondrial intermediate peptidase. , 1995, Methods in enzymology.

[32]  M. Gonzalo Claros,et al.  MitoProt, a Macintosh application for studying mitochondrial proteins , 1995, Comput. Appl. Biosci..

[33]  H. Weiner,et al.  Conversion of a Nonprocessed Mitochondrial Precursor Protein into One That Is Processed by the Mitochondrial Processing Peptidase (*) , 1995, Journal of Biological Chemistry.

[34]  D A Kendall,et al.  Protein transport via amino-terminal targeting sequences: common themes in diverse systems. , 1995, Molecular membrane biology.

[35]  H. Weiner,et al.  Influence of the Mature Portion of a Precursor Protein on the Mitochondrial Signal Sequence* , 1996, The Journal of Biological Chemistry.

[36]  B. Dobberstein,et al.  Common Principles of Protein Translocation Across Membranes , 1996, Science.

[37]  P Vincens,et al.  Computational method to predict mitochondrially imported proteins and their targeting sequences. , 1996, European journal of biochemistry.

[38]  S F Altschul,et al.  Local alignment statistics. , 1996, Methods in enzymology.

[39]  Anders Gorm Pedersen,et al.  Neural Network Prediction of Translation Initiation Sites in Eukaryotes: Perspectives for EST and Genome Analysis , 1997, ISMB.

[40]  S. Brunak,et al.  SHORT COMMUNICATION Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites , 1997 .

[41]  D. Roise Recognition and Binding of Mitochondrial Presequences during the Import of Proteins into Mitochondria , 1997, Journal of bioenergetics and biomembranes.

[42]  Paul Horton,et al.  Better Prediction of Protein Cellular Localization Sites with the it k Nearest Neighbors Classifier , 1997, ISMB.

[43]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[44]  Erik L. L. Sonnhammer,et al.  A Hidden Markov Model for Predicting Transmembrane Helices in Protein Sequences , 1998, ISMB.

[45]  Importance of residues carboxyl terminal relative to the cleavage site in substrates of mitochondrial processing peptidase for their specific recognition and cleavage. , 1998, Journal of biochemistry.

[46]  B. Rost,et al.  Adaptation of protein surfaces to subcellular location. , 1998, Journal of molecular biology.

[47]  Anders Krogh,et al.  Prediction of Signal Peptides and Signal Anchors by a Hidden Markov Model , 1998, ISMB.

[48]  G. Schneider,et al.  Feature-extraction from endopeptidase cleavage sites in mitochondrial targeting peptides. , 1998, Proteins.

[49]  T. Hubbard,et al.  Using neural networks for prediction of the subcellular location of proteins. , 1998, Nucleic acids research.

[50]  K. Chou,et al.  Protein subcellular location prediction. , 1999, Protein engineering.

[51]  B. Guiard,et al.  An internal targeting signal directing proteins into the mitochondrial intermembrane space. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[52]  M. Cotton,et al.  Sequence and analysis of chromosome 4 of the plant Arabidopsis thaliana , 1999, Nature.

[53]  W. Neupert,et al.  The DNA Helicase, Hmi1p, Is Transported into Mitochondria by a C-terminal Cleavable Targeting Signal* , 1999, The Journal of Biological Chemistry.

[54]  Eugen C. Buehler,et al.  Sequence and analysis of chromosome 2 of the plant Arabidopsis thaliana , 1999, Nature.

[55]  G. Heijne,et al.  ChloroP, a neural network‐based method for predicting chloroplast transit peptides and their cleavage sites , 1999, Protein science : a publication of the Protein Society.

[56]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..