Application of Support Vector Machines in Bioinformatics

Recently a new learning method called support vector machines (SVM) has shown comparable or better results than neural networks on some applications. In this thesis we exploit the possibility of using SVM for three important issues of bioinformatics: the prediction of protein secondary structure, multi-class protein fold recognition, and the prediction of human signal peptide cleavage sites. By using similar data, we demonstrate that SVM can easily achieve comparable accuracy as using neural networks. Therefore, in the future it is a promising direction to apply SVM on more bioinformatics applications.

[1]  I. Muchnik,et al.  Recognition of a protein fold in the context of the Structural Classification of Proteins (SCOP) classification. , 1999, Proteins.

[2]  B. Robson,et al.  Conformational properties of amino acid residues in globular proteins. , 1976, Journal of molecular biology.

[3]  Tomaso A. Poggio,et al.  A general framework for object detection , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[5]  Harris Drucker,et al.  Comparison of learning algorithms for handwritten digit recognition , 1995 .

[6]  S. Wodak,et al.  Prediction of protein backbone conformation based on seven structure assignments. Influence of local interactions. , 1991, Journal of molecular biology.

[7]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .

[8]  C Sander,et al.  Third generation prediction of secondary structures. , 2000, Methods in molecular biology.

[9]  Manfred Glesner,et al.  Construction of a support vector machine with local experts , 1999 .

[10]  Søren Brunak,et al.  A Neural Network Method for Identification of Prokaryotic and Eukaryotic Signal Peptides and Prediction of their Cleavage Sites , 1997, Int. J. Neural Syst..

[11]  P Stolorz,et al.  Predicting protein secondary structure using neural net and statistical methods. , 1992, Journal of molecular biology.

[12]  Jude W. Shavlik,et al.  Using Knowledge-Based Neural Networks to Improve Algorithms: Refining the Chou–Fasman Algorithm for Protein Folding , 2004, Machine Learning.

[13]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[14]  J M Chandonia,et al.  Neural networks for secondary structure and structural class predictions , 1995, Protein science : a publication of the Protein Society.

[15]  O. Lund,et al.  Prediction of protein secondary structure at 80% accuracy , 2000, Proteins.

[16]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[17]  P Willett,et al.  Use of techniques derived from graph theory to compare secondary structure motifs in proteins. , 1990, Journal of molecular biology.

[18]  B. Rost,et al.  Improved prediction of protein secondary structure by use of sequence profiles and neural networks. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[19]  A V Finkelstein,et al.  The classification and origins of protein folding patterns. , 1990, Annual review of biochemistry.

[20]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[21]  J. Garnier,et al.  Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. , 1978, Journal of molecular biology.

[22]  John P. Overington,et al.  The prediction and orientation of alpha-helices from sequence alignments: the combined use of environment-dependent substitution tables, Fourier transform methods and helix capping rules. , 1994, Protein engineering.

[23]  C. Sander,et al.  Database of homology‐derived protein structures and the structural meaning of sequence alignment , 1991, Proteins.

[24]  A A Salamov,et al.  Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiple sequence alignments. , 1995, Journal of molecular biology.

[25]  I. Kuntz,et al.  Tertiary Structure Prediction , 1989 .

[26]  I. Muchnik,et al.  Recognition of a protein fold in the context of the SCOP classification , 1999 .

[27]  J. Gibrat,et al.  Secondary structure prediction: combination of three different methods. , 1988, Protein engineering.

[28]  Bernhard Schölkopf,et al.  Training Invariant Support Vector Machines , 2002, Machine Learning.

[29]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[30]  Gunnar Rätsch,et al.  Predicting Time Series with Support Vector Machines , 1997, ICANN.

[31]  S. Brunak,et al.  Protein secondary structure and homology by neural networks The α‐helices in rhodopsin , 1988 .

[32]  G. Fasman Prediction of Protein Structure and the Principles of Protein Conformation , 2012, Springer US.

[33]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[34]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[35]  Giovanni Soda,et al.  Exploiting the past and the future in protein secondary structure prediction , 1999, Bioinform..

[36]  R Langridge,et al.  Improvements in protein secondary structure prediction by an enhanced neural network. , 1990, Journal of molecular biology.

[37]  Johannes Schuchhardt,et al.  Adaptive encoding neural networks for the recognition of human signal peptide cleavage sites , 2000, Bioinform..

[38]  I. Muchnik,et al.  Prediction of protein folding class using global description of amino acid sequence. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[39]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[40]  P. Argos,et al.  Knowledge‐based protein secondary structure assignment , 1995, Proteins.

[41]  M. Sternberg,et al.  Prediction of protein secondary structure and active sites using the alignment of homologous sequences. , 1987, Journal of molecular biology.

[42]  Thorsten Joachims,et al.  The Maximum-Margin Approach to Learning Text Classifiers , 2001, Künstliche Intell..

[43]  S. Sathiya Keerthi,et al.  A fast iterative nearest point algorithm for support vector machine classifier design , 2000, IEEE Trans. Neural Networks Learn. Syst..

[44]  O. Gascue,et al.  A simple method for predicting the secondary structure of globular proteins : . . . implications and accuracy , 2022 .

[45]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[46]  Gérard Dreyfus,et al.  Single-layer learning revisited: a stepwise procedure for building and training a neural network , 1989, NATO Neurocomputing.

[47]  S. Muggleton,et al.  Protein secondary structure prediction using logic-based machine learning. , 1992, Protein engineering.

[48]  G J Barton,et al.  Evaluation and improvement of multiple sequence methods for protein secondary structure prediction , 1999, Proteins.

[49]  Ulrich H.-G. Kreßel,et al.  Pairwise classification and support vector machines , 1999 .

[50]  T. D. Schneider,et al.  Sequence logos: a new way to display consensus sequences. , 1990, Nucleic acids research.

[51]  M Kanehisa A multivariate analysis method for discriminating protein secondary structural segments. , 1988, Protein engineering.

[52]  J. Mesirov,et al.  Hybrid system for protein secondary structure prediction. , 1992, Journal of molecular biology.

[53]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[54]  Tim J. P. Hubbard,et al.  SCOP: a structural classification of proteins database , 1998, Nucleic Acids Res..

[55]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[56]  Isabelle Guyon,et al.  Writer-adaptation for on-line handwritten character recognition , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[57]  S. Brunak,et al.  SHORT COMMUNICATION Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites , 1997 .

[58]  M. Karplus,et al.  Protein secondary structure prediction with a neural network. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[59]  Sayan Mukherjee,et al.  Molecular classification of multiple tumor types , 2001, ISMB.

[60]  A. Lupas,et al.  Predicting coiled coils from protein sequences , 1991, Science.

[61]  S. Sathiya Keerthi,et al.  Evaluation of simple performance measures for tuning SVM hyperparameters , 2003, Neurocomputing.

[62]  A. Finkelstein,et al.  Theory of protein secondary structure and algorithm of its prediction , 1983, Biopolymers.

[63]  Christophe Geourjon,et al.  SOPMA: significant improvements in protein secondary structure prediction by consensus prediction from multiple alignments , 1995, Comput. Appl. Biosci..

[64]  M. S. Brown,et al.  Support Vector Machine Classification of Microarray from Gene Expression Data , 1999 .

[65]  J N Weinstein,et al.  New joint prediction algorithm (Q7-JASEP) improves the prediction of protein secondary structure. , 1991, Biochemistry.

[66]  Kristin P. Bennett,et al.  On support vector decision trees for database marketing , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[67]  J. Gibrat,et al.  Further developments of protein secondary structure prediction using information theory. New parameters and consideration of residue pairs. , 1987, Journal of molecular biology.

[68]  H Nielsen,et al.  Machine learning approaches for the prediction of signal peptides and other protein sorting signals. , 1999, Protein engineering.

[69]  B. Lee,et al.  Conformational preference functions for predicting helices in membrane proteins , 1993, Biopolymers.

[70]  H. Scheraga,et al.  Status of empirical methods for the prediction of protein backbone topography. , 1976, Biochemistry.

[71]  B. Rost Review: protein secondary structure prediction continues to rise. , 2001, Journal of structural biology.

[72]  J. Weston,et al.  Support vector regression with ANOVA decomposition kernels , 1999 .

[73]  J. M. Thornton,et al.  Prediction of super-secondary structure in proteins , 1983, Nature.

[74]  T. Sejnowski,et al.  Predicting the secondary structure of globular proteins using neural network models. , 1988, Journal of molecular biology.

[75]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[76]  T L Blundell,et al.  The use of amino acid patterns of classified helices and strands in secondary structure prediction. , 1996, Journal of molecular biology.

[77]  Chih-Jen Lin,et al.  Formulations of Support Vector Machines: A Note from an Optimization Point of View , 2001, Neural Computation.

[78]  K. Nagano,et al.  Triplet information in helix prediction applied to the analysis of super-secondary structures. , 1977, Journal of molecular biology.