Predicting Signal Peptides and Their Cleavage Sites Using Support Vector Machines and Improved Position Weight Matrixes

In this paper, we develop a method for predicting signal peptides and their cleavage sites. Unlike other published work, we divide proteins into two segments and calculate the amino acid compositions on both segments. After that, we hybridize the pseudo amino acid compositions (PseAAs) to the feature vectors. Using support vector machines (SVMs) to train the datasets, we get better results than those with the optimized evidence-theoretic K nearest neighbor (OET-KNN) classifier. The overall rate of correct prediction for signal peptides is over 97%. For identifying cleavage sites, we use the scaled window proposed by Chou to extract cleavable secretory segments and non-cleavable secretory segments and improve the position weight matrix (PWM) method proposed by Hiller et al.. By hybridizing the scaled window and PWM methods, the correct prediction for signal peptides cleavage sites is also better or comparable to other methods.

[1]  Peixiang Cai,et al.  Predicting protein structural class with pseudo-amino acid composition and support vector machine fusion network. , 2006, Analytical biochemistry.

[2]  L. Gierasch Signal sequences. , 1989, Biochemistry.

[3]  S. Brunak,et al.  Improved prediction of signal peptides: SignalP 3.0. , 2004, Journal of molecular biology.

[4]  K. Chou Using subsite coupling to predict signal peptides. , 2001, Protein engineering.

[5]  Kuo-Chen Chou,et al.  Prediction of protease types in a hybridization space. , 2006, Biochemical and biophysical research communications.

[6]  X.-D. Sun,et al.  Prediction of protein structural classes using support vector machines , 2006, Amino Acids.

[7]  G von Heijne,et al.  Signal sequences. The limits of variation. , 1985, Journal of molecular biology.

[8]  Kuo-Chen Chou,et al.  Support vector machines for prediction of protein signal sequences and their cleavage sites , 2003, Peptides.

[9]  J. Gordon,et al.  Computer-assisted predictions of signal peptidase processing sites. , 1987, Biochemical and biophysical research communications.

[10]  K. Chou Prediction of signal peptides using scaled window , 2001, Peptides.

[11]  K. Chou,et al.  Prediction of protein signal sequences and their cleavage sites by statistical rulers. , 2005, Biochemical and biophysical research communications.

[12]  G. von Heijne,et al.  Signal sequences: The limits of variation , 1985 .

[13]  G. Heijne A new method for predicting signal sequence cleavage sites. , 1986 .

[14]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[15]  Kuo-Chen Chou,et al.  Signal-CF: a subsite-coupled and window-fusing approach for predicting signal peptides. , 2007, Biochemical and biophysical research communications.

[16]  Q. Pan,et al.  Using pseudo amino acid composition to predict protein subcellular location: approached with amino acid composition distribution , 2008, Amino Acids.

[17]  Kuo-Chen Chou,et al.  Predicting enzyme family class in a hybridization space , 2004, Protein science : a publication of the Protein Society.

[18]  G. von Heijne,et al.  A new method for predkting signal sequence cleavage sites , 2022 .

[19]  Hiroyuki Ogata,et al.  AAindex: Amino Acid Index Database , 1999, Nucleic Acids Res..

[20]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[21]  K. Chou,et al.  Predicting protein-protein interactions from sequences in a hybridization space. , 2006, Journal of proteome research.

[22]  Dieter Jahn,et al.  PrediSi: prediction of signal peptides and their cleavage positions , 2004, Nucleic Acids Res..

[23]  Yanda Li,et al.  Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence , 2006, BMC Bioinformatics.

[24]  K. Chou,et al.  Prediction of protein signal sequences and their cleavage sites , 2001, Proteins.

[25]  Kuo-Chen Chou,et al.  Prediction of protein signal sequences. , 2002, Current protein & peptide science.

[26]  S. Brunak,et al.  SHORT COMMUNICATION Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites , 1997 .

[27]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.