Protein secondary structure prediction using support vector machines

Among the computational methods used for predicting secondary structure proteins highlights the use of support vector machines. This research shows the predicted secondary structure of protein from its primary amino acid sequence using Support Vector Machines. As inputs, in the proposed methodology, features are used from different structural motifs or text strings associated with the primary structure which represents the secondary structure, such as R-group and the probability that the amino acid at position adopts a central particular secondary structure. For feature extraction method is used coding of sequences in which each symbol in the primary structure is associated with each symbol in the secondary structure. The use of this encoding method reduces the dimensionality of the data of thousands of characteristics only 220 of these. The results obtained are comparable to those reported in the literature, taking about 70% accuracy. Furthermore, it is possible to reduce computational cost in the construction of classifiers because this work models the problem of multi classification as a group of binary classifiers.

[1]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[2]  J. Garnier,et al.  Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. , 1978, Journal of molecular biology.

[3]  Lukasz A. Kurgan,et al.  Highly accurate and consistent method for prediction of helix and strand content from primary protein sequences , 2005, Artif. Intell. Medicine.

[4]  G J Barton,et al.  Evaluation and improvement of multiple sequence methods for protein secondary structure prediction , 1999, Proteins.

[5]  S H Kim,et al.  Predicting protein secondary structure content. A tandem neural network approach. , 1992, Journal of molecular biology.

[6]  Bin Wang,et al.  Weave amino acid sequences for protein secondary structure prediction , 2003, DMKD '03.

[7]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[8]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[9]  B. Rost,et al.  Alignments grow, secondary structure prediction improves , 2002, Proteins.

[10]  T. Hubbard,et al.  Fold recognition and ab initio structure predictions using hidden markov models and β‐strand pair potentials , 1995, Proteins.

[11]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[12]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[13]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[14]  K Nishikawa,et al.  The folding type of a protein is relevant to the amino acid composition. , 1986, Journal of biochemistry.

[15]  N. Balakrishnan,et al.  Characterization of protein secondary structure , 2004, IEEE Signal Processing Magazine.

[16]  Lijun Wang,et al.  Improved Protein Secondary Structure Prediction Using a Intelligent HSVM Method with a New Encoding Scheme , 2011 .

[17]  A A Salamov,et al.  Protein secondary structure prediction using local alignments. , 1997, Journal of molecular biology.

[18]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[19]  Peixiang Cai,et al.  Prediction of protein secondary structure content using support vector machine. , 2007, Talanta.

[20]  S. Hua,et al.  A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach. , 2001, Journal of molecular biology.

[21]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[22]  D. Mount Bioinformatics: Sequence and Genome Analysis , 2001 .

[23]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[24]  B. Rost,et al.  Redefining the goals of protein secondary structure prediction. , 1994, Journal of molecular biology.

[25]  B. Rost,et al.  Improved prediction of protein secondary structure by use of sequence profiles and neural networks. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[27]  Piyali Chatterjee,et al.  PSP_MCSVM: brainstorming consensus prediction of protein secondary structures using two-stage multiclass support vector machines , 2011, Journal of molecular modeling.

[28]  Yu-Dong Cai,et al.  Support Vector Machines for predicting protein structural class , 2001, BMC Bioinformatics.

[29]  V. Thorsson,et al.  HMMSTR: a hidden Markov model for local sequence-structure correlations in proteins. , 2000, Journal of molecular biology.

[30]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[31]  Hyunsoo Kim,et al.  Protein secondary structure prediction based on an improved support vector machines approach. , 2003, Protein engineering.

[32]  Bingru Yang,et al.  HYBP_PSSP: a hybrid back propagation method for predicting protein secondary structure , 2011, Neural Computing and Applications.

[33]  Firoz Anwar,et al.  Protein secondary structure prediction with high accuracy using Support Vector Machine , 2007, 2007 10th international conference on computer and information technology.

[34]  De-Shuang Huang,et al.  Improving protein secondary structure prediction by using the residue conformational classes , 2005, Pattern Recognit. Lett..

[35]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.