Improved protein secondary structure prediction using support vector machine with a new encoding scheme and an advanced tertiary classifier

Prediction of protein secondary structures is an important problem in bioinformatics and has many applications. The recent trend of secondary structure prediction studies is mostly based on the neural network or the support vector machine (SVM). The SVM method is a comparatively new learning system which has mostly been used in pattern recognition problems. In this study, SVM is used as a machine learning tool for the prediction of secondary structure and several encoding schemes, including orthogonal matrix, hydrophobicity matrix, BLOSUM62 substitution matrix, and combined matrix of these, are applied and optimized to improve the prediction accuracy. Also, the optimal window length for six SVM binary classifiers is established by testing different window sizes and our new encoding scheme is tested based on this optimal window size via sevenfold cross validation tests. The results show 2% increase in the accuracy of the binary classifiers when compared with the instances in which the classical orthogonal matrix is used. Finally, to combine the results of the six SVM binary classifiers, a new tertiary classifier which combines the results of one-versus-one binary classifiers is introduced and the performance is compared with those of existing tertiary classifiers. According to the results, the Q/sub 3/ prediction accuracy of new tertiary classifier reaches 78.8% and this is better than the best result reported in the literature.

[1]  Kenneth Chin,et al.  Support Vector Machines applied to Speech Pattern Classification , 1999 .

[2]  Richard Wolfenden,et al.  Comparing the polarities of the amino acids: side-chain distribution coefficients between the vapor phase, cyclohexane, 1-octanol, and neutral aqueous solution , 1988 .

[3]  B. Rost,et al.  A modified definition of Sov, a segment‐based measure for protein secondary structure prediction assessment , 1999, Proteins.

[4]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .

[5]  B. Rost,et al.  Combining evolutionary information and neural networks to predict protein secondary structure , 1994, Proteins.

[6]  Anders Krogh,et al.  Improving Predicition of Protein Secondary Structure Using Structured Neural Networks and Multiple Sequence Alignments , 1996, J. Comput. Biol..

[7]  B. Rost,et al.  Redefining the goals of protein secondary structure prediction. , 1994, Journal of molecular biology.

[8]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[9]  S. Hua,et al.  A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach. , 2001, Journal of molecular biology.

[10]  P. Argos,et al.  Knowledge‐based protein secondary structure assignment , 1995, Proteins.

[11]  J M Chandonia,et al.  New methods for accurate prediction of protein secondary structure , 1999, Proteins.

[12]  Hyunsoo Kim,et al.  Protein secondary structure prediction based on an improved support vector machines approach. , 2003, Protein engineering.

[13]  F. Richards,et al.  Identification of structural motifs from protein coordinate data: Secondary structure and first‐level supersecondary structure * , 1988, Proteins.

[14]  James A. Casbon,et al.  Protein Secondary Structure Prediction with Support Vector Machines , 2002 .

[15]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[16]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[17]  S. K. Riis,et al.  Improving prediction of protein secondary structure using structured neural networks and multiple sequence alignments. , 1996, Journal of computational biology : a journal of computational molecular cell biology.

[18]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.