Biological applications of support vector machines

One of the major tasks in bioinformatics is the classification and prediction of biological data. With the rapid increase in size of the biological databanks, it is essential to use computer programs to automate the classification process. At present, the computer programs that give the best prediction performance are support vector machines (SVMs). This is because SVMs are designed to maximise the margin to separate two classes so that the trained model generalises well on unseen data. Most other computer programs implement a classifier through the minimisation of error occurred in training, which leads to poorer generalisation. Because of this, SVMs have been widely applied to many areas of bioinformatics including protein function prediction, protease functional site recognition, transcription initiation site prediction and gene expression data classification. This paper will discuss the principles of SVMs and the applications of SVMs to the analysis of biological data, mainly protein and DNA sequences.

[1]  Gunnar Rätsch,et al.  Engineering Support Vector Machine Kerneis That Recognize Translation Initialion Sites , 2000, German Conference on Bioinformatics.

[2]  David Haussler,et al.  A Discriminative Framework for Detecting Remote Protein Homologies , 2000, J. Comput. Biol..

[3]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[4]  John P. Overington,et al.  A structural basis for sequence comparisons. An evaluation of scoring methodologies. , 1993, Journal of molecular biology.

[5]  Yu-dong Cai,et al.  Support vector machines for predicting rRNA-, RNA-, and DNA-binding proteins from amino acid sequence. , 2003, Biochimica et biophysica acta.

[6]  C. Metz Basic principles of ROC analysis. , 1978, Seminars in nuclear medicine.

[7]  Y. Z. Chen,et al.  Protein function classification via support vector machine approach. , 2003, Mathematical biosciences.

[8]  Ramesh Sharda,et al.  Bankruptcy prediction using neural networks , 1994, Decis. Support Syst..

[9]  David G. Stork,et al.  Pattern Classification , 1973 .

[10]  Jaques Reifman,et al.  Support vector machines with selective kernel scaling for protein classification and identification of key amino acid positions , 2002, Bioinform..

[11]  A. Tomasselli,et al.  A cumulative specificity model for proteases from human immunodeficiency virus types 1 and 2, inferred from statistical analysis of an extended substrate data base. , 1991, The Journal of biological chemistry.

[12]  J. S. Sodhi,et al.  Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. , 2004, Journal of molecular biology.

[13]  T. Takagi,et al.  Prediction of protein-protein interaction sites using support vector machines. , 2004, Protein engineering, design & selection : PEDS.

[14]  Kuo-Chen Chou,et al.  Prediction of Protein Structural Classes by Support Vector Machines , 2002, Comput. Chem..

[15]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[16]  Jason Weston,et al.  Mismatch string kernels for discriminative protein classification , 2004, Bioinform..

[17]  Li Liao,et al.  Combining Pairwise Sequence Similarity and Support Vector Machines for Detecting Remote Protein Evolutionary and Structural Relationships , 2003, J. Comput. Biol..

[18]  M. O. Dayhoff A model of evolutionary change in protein , 1978 .

[19]  P. Dobson,et al.  Distinguishing enzyme structures from non-enzymes without alignments. , 2003, Journal of molecular biology.

[20]  Zheng Rong Yang,et al.  Reduced bio basis function neural network for identification of protein phosphorylation sites: comparison with pattern recognition algorithms , 2004, Comput. Biol. Chem..

[21]  K. Chou,et al.  Application of SVM to predict membrane protein types. , 2004, Journal of theoretical biology.

[22]  T. Sejnowski,et al.  Predicting the secondary structure of globular proteins using neural network models. , 1988, Journal of molecular biology.

[23]  Tatsuya Akutsu,et al.  Protein homology detection using string alignment kernels , 2004, Bioinform..

[24]  Kuo-Chen Chou,et al.  Support Vector Machine for predicting α-turn types , 2003, Peptides.

[25]  Yuyu Kuang,et al.  Conserved codon composition of ribosomal protein coding genes in Escherichia coli, Mycobacterium tuberculosis and Saccharomyces cerevisiae: lessons from supervised machine learning in functional genomics. , 2002, Nucleic acids research.

[26]  Kuo-Chen Chou,et al.  Predicting the linkage sites in glycoproteins using bio-basis function neural network , 2004, Bioinform..

[27]  Gianluca Pollastri,et al.  Combining protein secondary structure prediction models with ensemble methods of optimal complexity , 2004, Neurocomputing.

[28]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[29]  Zhirong Sun,et al.  Support vector machine approach for protein subcellular localization prediction , 2001, Bioinform..

[30]  Kuo-Chen Chou,et al.  Bio-support vector machines for computational proteomics , 2004, Bioinform..

[31]  F. Chu,et al.  Cancer Diagnosis and Protein Secondary Structure Prediction Using Support Vector Machines , 2005 .

[32]  Rebecca Thomson,et al.  Prediction of Natively Disordered Regions in Proteins Using a Bio-basis Function Neural Network , 2004, IDEAL.

[33]  Minoru Kanehisa,et al.  Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs , 2003, Bioinform..

[34]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[35]  Zheng Rong Yang,et al.  Bio-basis function neural network for prediction of protease cleavage sites in proteins , 2005, IEEE Transactions on Neural Networks.

[36]  David Haussler,et al.  Classifying G-protein coupled receptors with support vector machines , 2002, Bioinform..

[37]  David Haussler,et al.  Using the Fisher Kernel Method to Detect Remote Protein Homologies , 1999, ISMB.

[38]  Li Liao,et al.  Combining pairwise sequence similarity and support vector machines for remote protein homology detection , 2002, RECOMB '02.

[39]  Zheng Rong Yang,et al.  Prediction of Signal Peptides Using Bio-Basis Function Neural Networks and Decision Trees , 2006, Applied bioinformatics.

[40]  Yingdong Zhao,et al.  Application of support vector machines for T-cell epitopes prediction , 2003, Bioinform..

[41]  Zheng Rong Yang,et al.  Reduced Bio-basis Function Neural Networks for Protease Cleavage Site Prediction , 2004, J. Bioinform. Comput. Biol..

[42]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[43]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[44]  Zheng Rong Yang,et al.  Characterizing proteolytic cleavage site activity using bio-basis function neural networks , 2003, Bioinform..

[45]  Kuo-Chen Chou,et al.  Support vector machine for predicting alpha-turn types. , 2003, Peptides.

[46]  I-Min A. Dubchak,et al.  A computational approach to identify genes for functional RNAs in genomic sequences. , 2001, Nucleic acids research.

[47]  Bermseok Oh,et al.  Prediction of phosphorylation sites using SVMs , 2004, Bioinform..

[48]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.