Support vector machines for prediction of protein signal sequences and their cleavage sites

Given a nascent protein sequence, how can one predict its signal peptide or "Zipcode" sequence? This is an important problem for scientists to use signal peptides as a vehicle to find new drugs or to reprogram cells for gene therapy (see, e.g. K.C. Chou, Current Protein and Peptide Science 2002;3:615-22). In this paper, support vector machines (SVMs), a new machine learning method, is applied to approach this problem. The overall rate of correct prediction for 1939 secretary proteins and 1440 nonsecretary proteins was over 91%. It has not escaped our attention that the new method may also serve as a useful tool for further investigating many unclear details regarding the molecular mechanism of the ZIP code protein-sorting system in cells.

[1]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[2]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[3]  G P Zhou,et al.  Some insights into protein structural class prediction , 2001, Proteins.

[4]  K. Chou,et al.  Prediction of protein signal sequences and their cleavage sites , 2001, Proteins.

[5]  K C Chou,et al.  Prediction of protein structural classes and subcellular locations. , 2000, Current protein & peptide science.

[7]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[8]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[9]  M. Sternberg Protein Structure Prediction: A Practical Approach , 1997 .

[10]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[11]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[12]  K. Chou Using subsite coupling to predict signal peptides. , 2001, Protein engineering.

[13]  S. Brunak,et al.  SHORT COMMUNICATION Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites , 1997 .

[14]  Yu-Dong Cai,et al.  Is it a paradox or misinterpretation? , 2001, Proteins.

[15]  S. Brunak,et al.  Prediction of N-terminal protein sorting signals. , 1997, Current opinion in structural biology.

[16]  K. Chou,et al.  Protein subcellular location prediction. , 1999, Protein engineering.

[17]  Kuo-Chen Chou,et al.  Prediction of protein signal sequences. , 2002, Current protein & peptide science.

[18]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[19]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[20]  K. Chou Prediction of signal peptides using scaled window , 2001, Peptides.