Multiple Kernel Learning Algorithms

In recent years, several methods have been proposed to combine multiple kernels instead of using a single one. These different kernels may correspond to using different notions of similarity or may be using information coming from multiple sources (different representations or different feature subsets). In trying to organize and highlight the similarities and differences between them, we give a taxonomy of and review several multiple kernel learning algorithms. We perform experiments on real data sets for better illustration and comparison of existing algorithms. We see that though there may not be large differences in terms of accuracy, there is difference between them in complexity as given by the number of stored support vectors, the sparsity of the solution as given by the number of used kernels, and training time complexity. We see that overall, using multiple kernels instead of a single one is useful and believe that combining kernels in a nonlinear or data-dependent way seems more promising than linear combination in fusing information provided by simple linear kernels, whereas linear methods are more reasonable when combining complex Gaussian kernels.

[1]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[2]  Thomas G. Dietterich,et al.  In Advances in Neural Information Processing Systems 12 , 1991, NIPS 1991.

[3]  Ethem Alpaydın,et al.  Combined 5 x 2 cv F Test for Comparing Supervised Classification Learning Algorithms , 1999, Neural Comput..

[4]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[5]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[6]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[7]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[8]  Nello Cristianini,et al.  Composite Kernels for Hypertext Categorisation , 2001, ICML.

[9]  Jason Weston,et al.  Gene functional classification from heterogeneous data , 2001, RECOMB.

[10]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[11]  Kristin P. Bennett,et al.  MARK: a boosting algorithm for heterogeneous kernel models , 2002, KDD.

[12]  Koby Crammer,et al.  Kernel Design Using Boosting , 2002, NIPS.

[13]  Olivier Bousquet,et al.  On the Complexity of Learning the Kernel Matrix , 2002, NIPS.

[14]  Yves Grandvalet,et al.  Adaptive Scaling for Feature Selection in SVMs , 2002, NIPS.

[15]  N. Cristianini,et al.  Optimizing Kernel Alignment over Combinations of Kernel , 2002 .

[16]  Alexander J. Smola,et al.  Hyperkernels , 2002, NIPS.

[17]  Bernhard Schölkopf,et al.  Kernel Methods in Computational Biology , 2005 .

[18]  Cheng Soon Ong,et al.  Machine learning using hyperkernels , 2003, ICML 2003.

[19]  Nello Cristianini,et al.  Kernel-Based Data Fusion and Its Application to Protein Function Prediction in Yeast , 2003, Pacific Symposium on Biocomputing.

[20]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[21]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[22]  Bernhard Schölkopf,et al.  Support Vector Machine Applications in Computational Biology , 2004 .

[23]  Nello Cristianini,et al.  A statistical framework for genomic data fusion , 2004, Bioinform..

[24]  Jinbo Bi,et al.  Column-generation boosting methods for mixture of kernels , 2004, KDD.

[25]  Dale Schuurmans,et al.  Maximum Margin Clustering , 2004, NIPS.

[26]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[27]  Kiyoshi Asai,et al.  Minimizing the Cross Validation Error to Mix Kernel Matrices of Heterogeneous Biological Data , 2004, Neural Processing Letters.

[28]  Jun Wang,et al.  A support vector machine with a hybrid kernel and minimal Vapnik-Chervonenkis dimension , 2004, IEEE Transactions on Knowledge and Data Engineering.

[29]  Javier M. Moguerza,et al.  Combining Kernel Information for Support Vector Classification , 2004, Multiple Classifier Systems.

[30]  William Stafiord Noble,et al.  Support vector machine applications in computational biology , 2004 .

[31]  Javier M. Moguerza,et al.  Improving Support Vector Classification via the Combination of Multiple Sources of Information , 2004, SSPR/SPR.

[32]  Murat Dundar,et al.  A fast iterative algorithm for fisher discriminant using heterogeneous kernels , 2004, ICML.

[33]  William Stafford Noble,et al.  Kernel methods for predicting protein-protein interactions , 2005, ISMB.

[34]  Alexander J. Smola,et al.  Learning the Kernel with Hyperkernels , 2005, J. Mach. Learn. Res..

[35]  Simon Rogers,et al.  Hierarchic Bayesian models for kernel learning , 2005, ICML.

[36]  Charles A. Micchelli,et al.  Learning the Kernel Function via Regularization , 2005, J. Mach. Learn. Res..

[37]  Gunnar Rätsch,et al.  A General and Efficient Multiple Kernel Learning Algorithm , 2005, NIPS.

[38]  T. Lane,et al.  Multiple Kernel Learning for Support Vector Regression ∗ , 2005 .

[39]  Charles A. Micchelli,et al.  Learning Convex Combinations of Continuously Parameterized Basic Kernels , 2005, COLT.

[40]  Mingjun Zhong,et al.  Data Integration for Classification Problems Employing Gaussian Process Priors , 2006, NIPS.

[41]  William Stafford Noble,et al.  Nonstationary kernel combination , 2006, ICML.

[42]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[43]  Ivor W. Tsang,et al.  Efficient hyperkernel learning using second-order cone programming , 2004, IEEE Transactions on Neural Networks.

[44]  William Stafford Noble,et al.  Support vector machine learning from heterogeneous data: an empirical analysis using protein sequence and structure , 2006, Bioinform..

[45]  Charles A. Micchelli,et al.  A DC-programming algorithm for kernel selection , 2006, ICML.

[46]  Bernhard Schölkopf,et al.  A Direct Method for Building Sparse Kernel Learning Algorithms , 2006, J. Mach. Learn. Res..

[47]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[48]  Stephen P. Boyd,et al.  Optimal kernel selection in Kernel Fisher discriminant analysis , 2006, ICML.

[49]  Manik Varma,et al.  Learning The Discriminative Power-Invariance Trade-Off , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[50]  T. Glasmachers,et al.  Gradient-Based Optimization of Kernel-Target Alignment for Sequence Kernels Applied to Bacterial Gene Start Detection , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[51]  Tijl De Bie,et al.  Kernel-based data fusion for gene prioritization , 2007, ISMB/ECCB.

[52]  Yoshihiro Yamanishi,et al.  Glycan classification with tree kernels , 2007, Bioinform..

[53]  Cheng Soon Ong,et al.  Multiclass multiple kernel learning , 2007, ICML '07.

[54]  Jieping Ye,et al.  Learning the kernel matrix in discriminant analysis via quadratically constrained quadratic programming , 2007, KDD '07.

[55]  Jieping Ye,et al.  Discriminant kernel and regularization parameter learning via semidefinite programming , 2007, ICML '07.

[56]  Wan-Jui Lee,et al.  Kernel Combination Versus Classifier Combination , 2007, MCS.

[57]  Yves Grandvalet,et al.  More efficiency in multiple kernel learning , 2007, ICML '07.

[58]  Mark J. F. Gales,et al.  Multiple kernel learning for speaker verification , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[59]  Cheng Soon Ong,et al.  An Automated Combination of Kernels for Predicting Protein Subcellular Localization , 2007, WABI.

[60]  Theodoros Damoulas,et al.  Probabilistic multi-class multi-kernel learning: on protein fold recognition and remote homology detection , 2008, Bioinform..

[61]  Jieping Ye,et al.  Multi-class Discriminant Kernel Learning via Convex Programming , 2008, J. Mach. Learn. Res..

[62]  Tu Bao Ho,et al.  An efficient kernel matrix evaluation measure , 2008, Pattern Recognit..

[63]  Francis R. Bach,et al.  Consistency of the group Lasso and multiple kernel learning , 2007, J. Mach. Learn. Res..

[64]  G. Weber LEARNING WITH INFINITELY MANY KERNELS VIA SEMI-INFINITE PROGRAMMING , 2008 .

[65]  Yves Grandvalet,et al.  Composite kernel learning , 2008, ICML '08.

[66]  Patrick Kenny,et al.  Kernel combination for SVM speaker verification , 2008, Odyssey.

[67]  O. Chapelle Second order optimization of kernel parameters , 2008 .

[68]  Francis R. Bach,et al.  Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning , 2008, NIPS.

[69]  Ethem Alpaydin,et al.  Localized multiple kernel learning , 2008, ICML '08.

[70]  Sebastian Nowozin,et al.  Infinite Kernel Learning , 2008, NIPS 2008.

[71]  Zenglin Xu,et al.  An Extended Level Method for Efficient Multiple Kernel Learning , 2008, NIPS.

[72]  Chiou-Shann Fuh,et al.  Dimensionality Reduction for Data in Multiple Feature Representations , 2008, NIPS.

[73]  Tu Bao Ho,et al.  Simple but effective methods for combining kernels in computational biology , 2008, 2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies.

[74]  Shih-Fu Chang,et al.  Fast kernel learning for spatial pyramid matching , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[75]  Terran Lane,et al.  A Framework for Multiple Kernel Support Vector Regression and Its Applications to siRNA Efficacy Prediction , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[76]  Ethem Alpayd Multiple Kernel Machines Using Localized Kernels , 2009 .

[77]  Bin Zhao,et al.  Multiple Kernel Clustering , 2009, SDM.

[78]  Josef Kittler,et al.  A Comparison of L_1 Norm and L_2 Norm Multiple Kernel SVMs in Image and Video Classification , 2009, 2009 Seventh International Workshop on Content-Based Multimedia Indexing.

[79]  K. R. Ramakrishnan,et al.  On the Algorithmics and Applications of a Mixed-norm based Kernel Learning Formulation , 2009, NIPS.

[80]  Trevor Darrell,et al.  Bayesian Localized Multiple Kernel Learning , 2009 .

[81]  Theodoros Damoulas,et al.  Pattern Recognition , 1998, Encyclopedia of Information Systems.

[82]  Mehryar Mohri,et al.  Learning Non-Linear Combinations of Kernels , 2009, NIPS.

[83]  Mehryar Mohri,et al.  L2 Regularization for Learning Kernels , 2009, UAI.

[84]  Mark J. F. Gales,et al.  Combining Derivative and Parametric Kernels for Speaker Verification , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[85]  Marius Kloft,et al.  Efficient and Accurate ` p-Norm Multiple Kernel Learning , 2009 .

[86]  Kaizhu Huang,et al.  Enhanced protein fold recognition through a novel data integration approach , 2009, BMC Bioinformatics.

[87]  Javier M. Moguerza,et al.  Methods for the combination of kernel matrices within a support vector framework , 2009, Machine Learning.

[88]  Manik Varma,et al.  More generality in efficient multiple kernel learning , 2009, ICML '09.

[89]  Zenglin Xu,et al.  Non-monotonic feature selection , 2009, ICML '09.

[90]  Ethem Alpaydin Multiple Kernel Machines Using Localized Kernels , 2009 .

[91]  M. Gönen,et al.  Machine learning integration for predicting the effect of single amino acid substitutions on protein stability , 2009, BMC Structural Biology.

[92]  Theodoros Damoulas,et al.  Pattern recognition with a Bayesian kernel combination machine , 2009, Pattern Recognit. Lett..

[93]  Wen Gao,et al.  Group-sensitive multiple kernel learning for object categorization , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[94]  Yiqiang Chen,et al.  Building Sparse Multiple-Kernel SVM Classifiers , 2009, IEEE Transactions on Neural Networks.

[95]  Klaus-Robert Müller,et al.  Efficient and Accurate Lp-Norm Multiple Kernel Learning , 2009, NIPS.

[96]  Wen Gao,et al.  A New Multiple Kernel Approach for Visual Concept Learning , 2009, MMM.

[97]  Gert R. G. Lanckriet,et al.  Partial order embedding with multiple kernels , 2009, ICML '09.

[98]  Zenglin Xu,et al.  Smooth Optimization for Effective Multiple Kernel Learning , 2010, AAAI.

[99]  Yung C. Shin,et al.  Sparse Multiple Kernel Learning for Signal Processing Applications , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[100]  Alexander Zien,et al.  Non-Sparse Regularization and Efficient Training with Multiple Kernels , 2010, ArXiv.

[101]  Zenglin Xu,et al.  Simple and Efficient Multiple Kernel Learning by Group Lasso , 2010, ICML.

[102]  Mehryar Mohri,et al.  Two-Stage Learning Kernel Algorithms , 2010, ICML.

[103]  Cristina Conde,et al.  Face verification with a kernel fusion method , 2010, Pattern Recognit. Lett..

[104]  Wen Gao,et al.  Per-Sample Multiple Kernel Approach for Visual Concept Learning , 2010, EURASIP J. Image Video Process..

[105]  Domenico Conforti,et al.  Kernel based support vector machine via semidefinite programming: Application to medical diagnosis , 2010, Comput. Oper. Res..

[106]  GönenMehmet,et al.  Multiple Kernel Learning Algorithms , 2011 .