KeBABS: an R package for kernel-based analysis of biological sequences

KeBABS provides a powerful, flexible and easy to use framework for KE: rnel- B: ased A: nalysis of B: iological S: equences in R. It includes efficient implementations of the most important sequence kernels, also including variants that allow for taking sequence annotations and positional information into account. KeBABS seamlessly integrates three common support vector machine (SVM) implementations with a unified interface. It allows for hyperparameter selection by cross validation, nested cross validation and also features grouped cross validation. The biological interpretation of SVM models is supported by (1) the computation of weights of sequence patterns and (2) prediction profiles that highlight the contributions of individual sequence positions or sections.

[1]  D. Wiley,et al.  The antigenic identity of peptide-MHC complexes: A comparison of the conformations of five viral peptides presented by HLA-A2 , 1993, Cell.

[2]  Michael E. Tipping Sparse Bayesian Learning and the Relevance Vector Machine , 2001, J. Mach. Learn. Res..

[3]  Eleazar Eskin,et al.  The Spectrum Kernel: A String Kernel for SVM Protein Classification , 2001, Pacific Symposium on Biocomputing.

[4]  Douglas L. Brutlag,et al.  Remote homology detection: a motif based approach , 2003, ISMB.

[5]  Jan Gorodkin,et al.  Comparing two K-category assignments by a K-category correlation coefficient , 2004, Comput. Biol. Chem..

[6]  Jason Weston,et al.  Mismatch string kernels for discriminative protein classification , 2004, Bioinform..

[7]  Kurt Hornik,et al.  kernlab - An S4 Package for Kernel Methods in R , 2004 .

[8]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[9]  Rainer Merkl,et al.  Oligo kernels for datamining on biological sequences: a case study on prokaryotic translation initiation sites , 2004, BMC Bioinformatics.

[10]  Gunnar Rätsch,et al.  RASE: recognition of alternatively spliced exons in C.elegans , 2005, ISMB.

[11]  Klaus Obermayer,et al.  Support Vector Machines for Dyadic Data , 2006, Neural Computation.

[12]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[13]  V. Pavlovic,et al.  A fast , large-scale learning method for protein sequence classification , 2008 .

[14]  Thomas Lengauer,et al.  Predicting MHC class I epitopes in large datasets , 2010, BMC Bioinformatics.

[15]  A. Visel,et al.  ChIP-seq accurately predicts tissue-specific activity of enhancers , 2009, Nature.

[16]  Ulrich Bodenhofer,et al.  Modeling Position Specificity in Sequence Kernels by Fuzzy Equivalence Relations , 2009, IFSA/EUSFLAT Conf..

[17]  Gunnar Rätsch,et al.  The SHOGUN Machine Learning Toolbox , 2010, J. Mach. Learn. Res..

[18]  Ulrich Bodenhofer,et al.  APCluster: an R package for affinity propagation clustering , 2011, Bioinform..

[19]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[20]  Ingrid G. Abfalter,et al.  Complex Networks Govern Coiled-Coil Oligomerization – Predicting and Profiling by Means of a Machine Learning Approach , 2011, Molecular & Cellular Proteomics.

[21]  Michael A. Beer,et al.  Discriminative prediction of mammalian enhancers from DNA sequence. , 2011, Genome research.

[22]  Ulrich Bodenhofer PrOCoil — A Web Service and an R Package for Predicting the Oligomerization of Coiled Coil Proteins , 2012 .