Prediction of the β-Hairpins in Proteins Using Support Vector Machine

By using of the composite vector with increment of diversity and scoring function to express the information of sequence, a support vector machine (SVM) algorithm for predicting β-hairpin motifs is proposed. The prediction is done on a dataset of 3,088 non homologous proteins containing 6,027 β-hairpins. The overall accuracy of prediction and Matthew’s correlation coefficient are 79.9% and 0.59 for the independent testing dataset. In addition, a higher accuracy of 83.3% and Matthew’s correlation coefficient of 0.67 in the independent testing dataset are obtained on a dataset previously used by Kumar et al. (Nuclic Acid Res 33:154–159). The performance of the method is also evaluated by predicting the β-hairpins of in the CASP6 proteins, and the better results are obtained. Moreover, this method is used to predict four kinds of supersecondary structures. The overall accuracy of prediction is 64.5% for the independent testing dataset.

[1]  R. Laxton The measure of diversity. , 1978, Journal of theoretical biology.

[2]  Baldomero Oliva,et al.  An automated classification of the structure of protein loops. , 1997, Journal of molecular biology.

[3]  David C. Jones Predicting novel protein folds by using FRAGFOLD , 2001, Proteins.

[4]  Hu Chen,et al.  A novel method for protein secondary structure prediction using dual‐layer SVM and profiles , 2004, Proteins.

[5]  T. Werner,et al.  MatInd and MatInspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence data. , 1995, Nucleic acids research.

[6]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[7]  Jens Meiler,et al.  Strand‐loop‐strand motifs: Prediction of hairpins and diverging turns in proteins , 2004, Proteins.

[8]  Herbert Gish,et al.  Speaker identification via support vector classifiers , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[9]  S. Hua,et al.  A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach. , 2001, Journal of molecular biology.

[10]  Liaofu Luo,et al.  Splice site prediction with quadratic discriminant analysis using diversity measure. , 2003, Nucleic acids research.

[11]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[12]  D Gorse,et al.  Prediction of the location and type of β‐turns in proteins using neural networks , 1999, Protein science : a publication of the Protein Society.

[13]  Gajendra P. S. Raghava,et al.  BhairPred: prediction of β-hairpins in a protein from multiple alignment information using ANN and SVM techniques , 2005, Nucleic Acids Res..

[14]  J. Thornton,et al.  Factors limiting the performance of prediction‐based fold recognition methods , 2008, Protein science : a publication of the Protein Society.

[15]  J. Thornton,et al.  PROMOTIF—A program to identify and analyze structural motifs in proteins , 1996, Protein science : a publication of the Protein Society.

[16]  Q. Z. Li,et al.  The prediction of the structural class of protein: application of the measure of diversity. , 2001, Journal of theoretical biology.

[17]  B. Rost,et al.  Protein fold recognition by prediction-based threading. , 1997, Journal of molecular biology.

[18]  Kuo-Chen Chou,et al.  Support Vector Machine for predicting α-turn types , 2003, Peptides.

[19]  Kuo-Chen Chou,et al.  Support vector machines for the classification and prediction of β‐turn types , 2002, Journal of peptide science : an official publication of the European Peptide Society.

[20]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[21]  M.M. Van Hulle,et al.  View-based 3D object recognition with support vector machines , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[22]  V. Thorsson,et al.  HMMSTR: a hidden Markov model for local sequence-structure correlations in proteins. , 2000, Journal of molecular biology.

[23]  C M Deane,et al.  Improved protein loop prediction from sequence alone. , 2001, Protein engineering.

[24]  Kuo-Chen Chou,et al.  Prediction of β-turns with learning machines , 2003, Peptides.

[25]  Alexander E. Kel,et al.  MATCHTM: a tool for searching transcription factor binding sites in DNA sequences , 2003, Nucleic Acids Res..

[26]  Baldomero Oliva,et al.  ArchDB: automated protein loop classification as a tool for structural genomics , 2004, Nucleic Acids Res..

[27]  K. Chou,et al.  Prediction of β-turns , 2009 .

[28]  Thomas Werner,et al.  MatInspector and beyond: promoter analysis based on transcription factor binding sites , 2005, Bioinform..

[29]  Szymon M. Kielbasa,et al.  Measuring similarities between transcription factor binding sites , 2005, BMC Bioinformatics.

[30]  D Xu,et al.  Prediction of protein supersecondary structures based on the artificial neural network method. , 1997, Protein engineering.

[31]  Janet M. Thornton,et al.  Toward predicting protein topology: An approach to identifying β hairpins , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[32]  A. Sandelin,et al.  Applied bioinformatics for the identification of regulatory elements , 2004, Nature Reviews Genetics.