Optimization of SVM parameters for recognition of regulatory DNA sequences

Identification and recognition of specific functionally-important DNA sequence fragments such as regulatory sequences are considered the most important problems in bioinformatics. One type of such fragments are promoters, i.e., short regulatory DNA sequences located upstream of a gene. Detection of regulatory DNA sequences is important for successful gene prediction and gene expression studies. In this paper, Support Vector Machine (SVM) is used for classification of DNA sequences and recognition of the regulatory sequences. For optimal classification, various SVM learning and kernel parameters (hyperparameters) and their optimization methods are analyzed. In a case study, optimization of the SVM hyperparameters for linear, polynomial and power series kernels is performed using a modification of the Nelder–Mead (downhill simplex) algorithm. The method allows for improving the precision of identification of the regulatory DNA sequences. The results of promoter recognition for the drosophila sequence datasets are presented.

[1]  Sarunas Raudys,et al.  Taxonomy of Classifiers Based on Dissimilarity Features , 2005, ICAPR.

[2]  Gintautas Dzemyda,et al.  Parameter System for Human Physiological Data Representation and Analysis , 2007, IbPRIA.

[3]  S. Durga Bhavani,et al.  Analysis of E.coli promoter recognition problem in dinucleotide feature space , 2007, Bioinform..

[4]  R. Damasevicius,et al.  Analysis of binary feature mapping rules for promoter recognition in imbalanced DNA sequence datasets using Support Vector Machine , 2008, 2008 4th International IEEE Conference Intelligent Systems.

[5]  Cheng-Jian Lin,et al.  Prediction of RNA Polymerase Binding Sites Using Purine-Pyrimidine Encoding and Hybrid Learning Methods , 2004 .

[6]  K. Schittkowski Optimal parameter selection in support vector machines , 2005 .

[7]  Vasile Palade,et al.  A neural network based multi-classifier system for gene identification in DNA sequences , 2004, Neural Computing & Applications.

[8]  Thomas Werner,et al.  The State of the Art of Mammalian Promoter Recognition , 2003, Briefings Bioinform..

[9]  Alexander Gammerman,et al.  Sequence alignment kernel for recognition of promoter regions , 2003, Bioinform..

[10]  Andreas Christmann,et al.  Determination of hyper-parameters for kernel based classification and regression , 2005 .

[11]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[12]  Nicola Ancona,et al.  Object detection in images: run-time complexity and parameter selection of support vector machines , 2002, Object recognition supported by user interaction for service robots.

[13]  Kate Smith-Miles,et al.  Automatic parameter selection for polynomial kernel , 2003, Proceedings Fifth IEEE Workshop on Mobile Computing Systems and Applications.

[14]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[15]  Robertas Damasevicius,et al.  Splice Site Recognition in DNA Sequences Using K-mer Frequency Based Mapping for Support Vector Machine with Power Series Kernel , 2008, 2008 International Conference on Complex, Intelligent and Software Intensive Systems.

[16]  B. Lang,et al.  Efficient optimization of support vector machine learning parameters for unbalanced datasets , 2006 .

[17]  Jie Yang,et al.  An Improved Parameter Tuning Method for Support Vector Machines , 2003, RSFDGrC.

[18]  G. Zhou,et al.  Neural network optimization for E. coli promoter prediction. , 1991, Nucleic acids research.

[19]  Simon Haykin,et al.  Support vector machines for dynamic reconstruction of a chaotic system , 1999 .

[20]  Etienne Barnard,et al.  Data characteristics that determine classifier performance , 2006 .

[21]  R. Debnath,et al.  An efficient method for tuning kernel parameter of the support vector machine , 2004, IEEE International Symposium on Communications and Information Technology, 2004. ISCIT 2004..

[22]  Christian Igel,et al.  Evolutionary tuning of multiple SVM parameters , 2005, ESANN.

[23]  A. Zell,et al.  Efficient parameter selection for support vector machines in classification and regression via model-based global optimization , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[24]  S. Sathiya Keerthi,et al.  Evaluation of simple performance measures for tuning SVM hyperparameters , 2003, Neurocomputing.

[25]  Ling Zhuang,et al.  Parameter Optimization of Kernel-based One-class Classifier on Imbalance Learning , 2006, J. Comput..

[26]  S. Knudsen,et al.  Prediction of human mRNA donor and acceptor sites from the DNA sequence. , 1991, Journal of molecular biology.

[27]  C. Gold,et al.  Fast Bayesian support vector machine parameter tuning with the Nystrom method , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[28]  Ching Y. Suen,et al.  Empirical error based optimization of SVM kernels: application to digit image recognition , 2002, Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition.

[29]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[30]  Vladimir Cherkassky,et al.  Learning from Data: Concepts, Theory, and Methods , 1998 .

[31]  Mary L. Cassabaum,et al.  Unsupervised optimization of support vector machine parameters , 2004, SPIE Defense + Commercial Sensing.

[32]  Thomas P. Trappenberg,et al.  A Heuristic for Free Parameter Optimization with Support Vector Machines , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[33]  Yanhong A. Liu,et al.  Static caching for incremental computation , 1998, TOPL.

[34]  F. Imbault,et al.  A stochastic optimization approach for parameter tuning of support vector machines , 2004, ICPR 2004.

[35]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[36]  Bhaskar D. Kulkarni,et al.  Support vector classification with parameter tuning assisted by agent-based technique , 2004, Comput. Chem. Eng..

[37]  Hojung Lim Support vector parameter selection using experimental design based generating set search (SVEG) with application to predictive software data modeling , 2004 .

[38]  B. Schölkopf,et al.  Asymptotically Optimal Choice of ε-Loss for Support Vector Machines , 1998 .