A Binary Classification Framework for Two-Stage Multiple Kernel Learning

With the advent of kernel methods, automating the task of specifying a suitable kernel has become increasingly important. In this context, the Multiple Kernel Learning (MKL) problem of finding a combination of prespecified base kernels that is suitable for the task at hand has received significant attention from researchers. In this paper we show that Multiple Kernel Learning can be framed as a standard binary classification problem with additional constraints that ensure the positive definiteness of the learned kernel. Framing MKL in this way has the distinct advantage that it makes it easy to leverage the extensive research in binary classification to develop better performing and more scalable MKL algorithms that are conceptually simpler, and, arguably, more accessible to practitioners. Experiments on nine data sets from different domains show that, despite its simplicity, the proposed technique compares favorably with current leading MKL approaches.

[1]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, ICANN.

[2]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, International Conference on Artificial Neural Networks.

[3]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[4]  S. Brunak,et al.  Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. , 2000, Journal of molecular biology.

[5]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[6]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[7]  N. Cristianini,et al.  Optimizing Kernel Alignment over Combinations of Kernel , 2002 .

[8]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[9]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[10]  Alexander J. Smola,et al.  Learning the Kernel with Hyperkernels , 2005, J. Mach. Learn. Res..

[11]  Martin Ester,et al.  Sequence analysis PSORTb v . 2 . 0 : Expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis , 2004 .

[12]  Maria-Florina Balcan,et al.  On a theory of learning with similarity functions , 2006, ICML.

[13]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[14]  Cheng Soon Ong,et al.  Multiclass multiple kernel learning , 2007, ICML '07.

[15]  Nathan Srebro,et al.  How Good Is a Kernel When Used as a Similarity Measure? , 2007, COLT.

[16]  Yves Grandvalet,et al.  More efficiency in multiple kernel learning , 2007, ICML '07.

[17]  Francis R. Bach,et al.  Consistency of the group Lasso and multiple kernel learning , 2007, J. Mach. Learn. Res..

[18]  P. Bartlett,et al.  ` p-Norm Multiple Kernel Learning , 2008 .

[19]  Mehryar Mohri,et al.  Learning Non-Linear Combinations of Kernels , 2009, NIPS.

[20]  Sebastian Nowozin,et al.  On feature combination for multiclass object classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[21]  Mehryar Mohri,et al.  Generalization Bounds for Learning Kernels , 2010, ICML.

[22]  Mehryar Mohri,et al.  Two-Stage Learning Kernel Algorithms , 2010, ICML.

[23]  Francesco Orabona,et al.  Ultra-Fast Optimization Algorithm for Sparse Multi Kernel Learning , 2011, ICML.

[24]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[25]  M. Kloft,et al.  l p -Norm Multiple Kernel Learning , 2011 .

[26]  Vikas Sindhwani,et al.  Non-parametric Group Orthogonal Matching Pursuit for Sparse Learning with Multiple Kernels , 2011, NIPS.

[27]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..