Prediction of signal peptides in protein sequences by neural networks.

We present here a neural network-based method for detection of signal peptides (abbreviation used: SP) in proteins. The method is trained on sequences of known signal peptides extracted from the Swiss-Prot protein database and is able to work separately on prokaryotic and eukaryotic proteins. A query protein is dissected into overlapping short sequence fragments, and then each fragment is analyzed with respect to the probability of it being a signal peptide and containing a cleavage site. While the accuracy of the method is comparable to that of other existing prediction tools, it provides a significantly higher speed and portability. The accuracy of cleavage site prediction reaches 73% on heterogeneous source data that contains both prokaryotic and eukaryotic sequences while the accuracy of discrimination between signal peptides and non-signal peptides is above 93% for any source dataset. As a consequence, the method can be easily applied to genome-wide datasets. The software can be downloaded freely from http://rpsp.bioinfo.pl/RPSP.tar.gz.

[1]  L. Gierasch,et al.  Helix formation and stability in a signal sequence. , 1989, Biochemistry.

[2]  Dariusz Plewczynski,et al.  Comparison of proteins based on segments structural similarity. , 2004, Acta biochimica Polonica.

[3]  L. Gierasch,et al.  Conformations and orientations of a signal peptide interacting with phospholipid monolayers. , 1989, Biochemistry.

[4]  Zheng Rong Yang,et al.  Prediction of Signal Peptides Using Bio-Basis Function Neural Networks and Decision Trees , 2006, Applied bioinformatics.

[5]  Dariusz Plewczynski,et al.  AutoMotif server: prediction of single residue post-translational modifications in proteins , 2005, Bioinform..

[6]  G. Heijne A new method for predicting signal sequence cleavage sites. , 1986 .

[7]  Masami Ikeda,et al.  The presence of signal peptide significantly affects transmembrane topology prediction , 2002, Bioinform..

[8]  L Liu,et al.  Information theory in prediction of cleavage sites of signal peptides. , 2005, Protein and peptide letters.

[9]  Mark A. Best,et al.  Bioinformatics: the Machine Learning Approach, 2nd edn , 2004 .

[10]  Piero Fariselli,et al.  SPEPlip: the detection of signal peptide and lipoprotein cleavage sites , 2003, Bioinform..

[11]  H Nielsen,et al.  Machine learning approaches for the prediction of signal peptides and other protein sorting signals. , 1999, Protein engineering.

[12]  A. Krogh,et al.  Prediction of lipoprotein signal peptides in Gram‐negative bacteria , 2003, Protein science : a publication of the Protein Society.

[13]  Søren Brunak,et al.  A Neural Network Method for Identification of Prokaryotic and Eukaryotic Signal Peptides and Prediction of their Cleavage Sites , 1997, Int. J. Neural Syst..

[14]  István Csabai,et al.  Improving signal peptide prediction accuracy by simulated neural network , 1991, Comput. Appl. Biosci..

[15]  S. Brunak,et al.  Improved prediction of signal peptides: SignalP 3.0. , 2004, Journal of molecular biology.

[16]  T. Rapoport Transport of proteins across the endoplasmic reticulum membrane. , 1992, Science.

[17]  G von Heijne,et al.  Net N-C charge imbalance may be important for signal sequence function in bacteria. , 1986, Journal of molecular biology.

[18]  Søren Brunak,et al.  Prediction of twin-arginine signal peptides , 2005, BMC Bioinformatics.

[19]  N. Blom,et al.  Feature-based prediction of non-classical and leaderless protein secretion. , 2004, Protein engineering, design & selection : PEDS.

[20]  L. Rychlewski,et al.  Homologues of HSV-1 nuclear egress factor UL34 are potential phosphoinositide-binding proteins. , 2008, Acta biochimica Polonica.

[21]  Dariusz Plewczynski,et al.  Support-vector-machine classification of linear functional motifs in proteins , 2006, Journal of molecular modeling.

[22]  Leszek Rychlewski,et al.  ELM server: a new resource for investigating short functional sites in modular eukaryotic proteins , 2003, Nucleic Acids Res..

[23]  J. Reguła,et al.  Molecular defense mechanisms of Barrett’s metaplasia estimated by an integrative genomics , 2007, Journal of Molecular Medicine.

[24]  Dariusz Plewczynski,et al.  Molecular modeling of phosphorylation sites in proteins using a database of local structure segments , 2005, Journal of molecular modeling.

[25]  Pierre Baldi,et al.  Bioinformatics - the machine learning approach (2. ed.) , 2000 .

[26]  K. Chou Prediction of signal peptides using scaled window , 2001, Peptides.

[27]  Gunnar von Heijne,et al.  Net N-C charge imbalance may be important for signal sequence function in bacteria , 1986 .

[28]  Dariusz Plewczynski,et al.  PDB-UF: database of predicted enzymatic functions for unannotated protein structures from structural genomics , 2006, BMC Bioinformatics.

[29]  Adam Godzik,et al.  Clustering of highly homologous sequences to reduce the size of large protein databases , 2001, Bioinform..

[30]  L. Rychlewski,et al.  Cytomegalovirus immediate early gene UL37 encodes a novel MHC-like protein. , 2008, Acta biochimica Polonica.

[31]  J. Reguła,et al.  Three clinical variants of gastroesophageal reflux disease form two distinct gene expression signatures , 2006, Journal of Molecular Medicine.

[32]  Dariusz Plewczynski,et al.  AutoMotif Server for prediction of phosphorylation sites in proteins using support vector machine: 2007 update , 2008, Journal of molecular modeling.

[33]  L. Gierasch Signal sequences. , 1989, Biochemistry.

[34]  L. Rychlewski,et al.  Fold recognition insights into function of herpes ICP4 protein. , 2007, Acta biochimica Polonica.

[35]  Toshio Shimizu,et al.  Evaluating transmembrane topology prediction methods for the effect of signal peptide in topology prediction , 2002, Silico Biol..

[36]  Søren Brunak,et al.  Non-classical protein secretion in bacteria , 2005, BMC Microbiology.

[37]  Anders Krogh,et al.  Prediction of Signal Peptides and Signal Anchors by a Hidden Markov Model , 1998, ISMB.

[38]  A. Krogh,et al.  A combined transmembrane topology and signal peptide prediction method. , 2004, Journal of molecular biology.

[39]  Jean-Philippe Vert,et al.  Support Vector Machine Prediction of Signal Peptide Cleavage Site Using a New Class of Kernels for Strings , 2001, Pacific Symposium on Biocomputing.

[40]  Dariusz M Plewczynski,et al.  A support vector machine approach to the identification of phosphorylation sites. , 2005, Cellular & molecular biology letters.

[41]  Leszek Rychlewski,et al.  A common cis-element in promoters of protein synthesis and cell cycle genes. , 2007, Acta biochimica Polonica.

[42]  R. Brasseur,et al.  Prediction of signal peptide functional properties: a study of the orientation and angle of insertion of yeast invertase mutants and human apolipoprotein B signal peptide variants. , 1996, Protein engineering.

[43]  Rolf Apweiler,et al.  A comparison of signal sequence prediction methods using a test set of signal peptides , 2000, Bioinform..

[44]  S. Brunak,et al.  SHORT COMMUNICATION Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites , 1997 .

[45]  T. Hubbard,et al.  Using neural networks for prediction of the subcellular location of proteins. , 1998, Nucleic acids research.

[46]  Dieter Jahn,et al.  PrediSi: prediction of signal peptides and their cleavage positions , 2004, Nucleic Acids Res..

[47]  Zemin Zhang,et al.  Signal peptide prediction based on analysis of experimentally verified cleavage sites , 2004, Protein science : a publication of the Protein Society.

[48]  Leszek Rychlewski,et al.  LigProf: A simple tool for in silico prediction of ligand-binding sites , 2007, Journal of molecular modeling.