Multiple Kernel SVM Based on Two-Stage Learning

In this paper we introduce the idea of two-stage learning for multiple kernel SVM (MKSVM) and present a new MKSVM algorithm based on two-stage learning (MKSVM-TSL). The first stage is the pre-learning and its aim is to obtain the information of data such that the “important” samples for classification can be generated in the formal learning stage and these samples are uniformly ergodic Markov chain (u.e.M.c.). To study comprehensively the proposed MKSVM-TSL algorithm, we estimate the generalization bound of MKSVM based on u.e.M.c. samples and obtain its fast learning rate. And in order to show the performance of the proposed MKSVM-TSL algorithm for better, we also perform the numerical experiments on various publicly available datasets. From the experimental results, we can find that compared to three classical multiple kernel learning (MKL) algorithms, the proposed MKSVM-TSL algorithm has better performance in three aspects of the total time of sampling and training, the accuracy and the sparsity of classifiers, respectively.

[1]  Terran Lane,et al.  A Framework for Multiple Kernel Support Vector Regression and Its Applications to siRNA Efficacy Prediction , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[2]  Mathukumalli Vidyasagar,et al.  Learning and Generalization: With Applications to Neural Networks , 2002 .

[3]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[4]  Kitsuchart Pasupa,et al.  Combining Multiple Features for Product Categorisation by Multiple Kernel Learning , 2018 .

[5]  Yiming Ying,et al.  Learning Rates of Least-Square Regularized Regression , 2006, Found. Comput. Math..

[6]  Yiming Ying,et al.  Multi-kernel regularized classifiers , 2007, J. Complex..

[7]  Claudio Gallicchio,et al.  Enhancing deep neural networks via multiple kernel learning , 2020, Pattern Recognit..

[8]  Ingo Steinwart,et al.  Fast Rates for Support Vector Machines , 2005, COLT.

[9]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[10]  Xuefeng Wang,et al.  Multiple-kernel learning for genomic data mining and prediction , 2018 .

[11]  N. Cristianini,et al.  Optimizing Kernel Alignment over Combinations of Kernel , 2002 .

[12]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[13]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[14]  Jin Zhu,et al.  Error Bounds for -Norm Multiple Kernel Learning with Least Square Loss , 2012 .

[15]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[16]  J. Rosenthal,et al.  General state space Markov chains and MCMC algorithms , 2004, math/0404033.

[17]  Shaogao Lv,et al.  Optimal learning rates of lp-type multiple kernel learning under general conditions , 2015, Inf. Sci..

[18]  Felipe Cucker,et al.  Best Choices for Regularization Parameters in Learning Theory: On the Bias—Variance Problem , 2002, Found. Comput. Math..

[19]  Zhehan Yi,et al.  Line-to-Line Fault Detection for Photovoltaic Arrays Based on Multiresolution Signal Decomposition and Two-Stage Support Vector Machine , 2017, IEEE Transactions on Industrial Electronics.

[20]  Tong Zhang,et al.  Covering Number Bounds of Certain Regularized Linear Function Classes , 2002, J. Mach. Learn. Res..

[21]  Yuan Yan Tang,et al.  $k$ -Times Markov Sampling for SVMC , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[22]  Johan A. K. Suykens,et al.  L2-norm multiple kernel learning and its application to biomedical data fusion , 2010, BMC Bioinformatics.

[23]  Jason Weston,et al.  Gene functional classification from heterogeneous data , 2001, RECOMB.

[24]  Tong Zhang Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[25]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[26]  Charles A. Micchelli,et al.  A DC-programming algorithm for kernel selection , 2006, ICML.

[27]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[28]  Charles A. Micchelli,et al.  Learning Convex Combinations of Continuously Parameterized Basic Kernels , 2005, COLT.

[29]  Yiming Ying,et al.  Support Vector Machine Soft Margin Classifiers: Error Analysis , 2004, J. Mach. Learn. Res..

[30]  T. Glasmachers,et al.  Gradient-Based Optimization of Kernel-Target Alignment for Sequence Kernels Applied to Bacterial Gene Start Detection , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[31]  Mehmet Gönen,et al.  Discriminating early- and late-stage cancers using multiple kernel learning on gene sets , 2018, Bioinform..

[32]  Ingo Steinwart,et al.  Consistency of support vector machines and other regularized kernel classifiers , 2005, IEEE Transactions on Information Theory.

[33]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[34]  Ding-Xuan Zhou,et al.  The covering number in learning theory , 2002, J. Complex..

[35]  William Stafford Noble,et al.  Kernel methods for predicting protein-protein interactions , 2005, ISMB.

[36]  Jie Xu,et al.  The Generalization Ability of SVM Classification Based on Markov Sampling , 2015, IEEE Transactions on Cybernetics.

[37]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[38]  Ingo Steinwart,et al.  Optimal Learning Rates for Localized SVMs , 2015, J. Mach. Learn. Res..