论文信息 - Regularization Strategies and Empirical Bayesian Learning for MKL

Regularization Strategies and Empirical Bayesian Learning for MKL

Multiple kernel learning (MKL), structured sparsity, and multi-task learning have recently received considerable attention. In this paper, we show how different MKL algorithms can be understood as applications of either regularization on the kernel weights or block-norm-based regularization, which is more common in structured sparsity and multi-task learning. We show that these two regularization strategies can be systematically mapped to each other through a concave conjugate operation. When the kernel-weight-based regularizer is separable into components, we can naturally consider a generative probabilistic model behind MKL. Based on this model, we propose learning algorithms for the kernel weights through the maximization of marginal likelihood. We show through numerical experiments that $\ell_2$-norm MKL and Elastic-net MKL achieve comparable accuracy to uniform kernel combination. Although uniform kernel combination might be preferable from its simplicity, $\ell_2$-norm MKL and Elastic-net MKL can learn the usefulness of the information sources represented as kernels. In particular, Elastic-net MKL achieves sparsity in the kernel weights.

Taiji Suzuki | Ryota Tomioka

[1] David P. Wipf,et al. A New View of Automatic Relevance Determination , 2007, NIPS.

[2] Klaus-Robert Müller,et al. Efficient and Accurate Lp-Norm Multiple Kernel Learning , 2009, NIPS.

[3] O. Chapelle. Second order optimization of kernel parameters , 2008 .

[4] David P. Wipf,et al. A unified Bayesian framework for MEG/EEG source imaging , 2009, NeuroImage.

[5] Bhaskar D. Rao,et al. Variational EM Algorithms for Non-Gaussian Latent Variable Models , 2005, NIPS.

[6] Pietro Perona,et al. Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[7] Gunnar Rätsch,et al. The SHOGUN Machine Learning Toolbox , 2010, J. Mach. Learn. Res..

[8] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[9] Manik Varma,et al. Learning The Discriminative Power-Invariance Trade-Off , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[10] Nello Cristianini,et al. Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[11] Ryota Tomioka,et al. Sparsity-accuracy trade-off in MKL , 2010, 1001.2615.

[12] Massimiliano Pontil,et al. Regularized multi--task learning , 2004, KDD.

[13] Cordelia Schmid,et al. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[14] Charles A. Micchelli,et al. Learning the Kernel Function via Regularization , 2005, J. Mach. Learn. Res..

[15] Koen E. A. van de Sande,et al. Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16] Zhihua Zhang,et al. Bayesian inference for transductive learning of kernel matrix using the Tanner-Wong data augmentation algorithm , 2004, ICML.

[17] Manik Varma,et al. More generality in efficient multiple kernel learning , 2009, ICML '09.

[18] Michael I. Jordan,et al. Computing regularization paths for learning multiple kernels , 2004, NIPS.

[19] Cheng Soon Ong,et al. Multiclass multiple kernel learning , 2007, ICML '07.

[20] Charles A. Micchelli,et al. A Family of Penalty Functions for Structured Sparsity , 2010, NIPS.

[21] Simon Rogers,et al. Hierarchic Bayesian models for kernel learning , 2005, ICML.

[22] N. Aronszajn. Theory of Reproducing Kernels. , 1950 .

[23] Francis R. Bach,et al. Structured Variable Selection with Sparsity-Inducing Norms , 2009, J. Mach. Learn. Res..

[24] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[25] Anthony Widjaja,et al. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[26] M. Yuan,et al. Model selection and estimation in regression with grouped variables , 2006 .

[27] M. Kloft,et al. Non-sparse Multiple Kernel Learning , 2008 .

[28] Michael I. Jordan,et al. Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[29] Jean-Philippe Vert,et al. Group lasso with overlap and graph lasso , 2009, ICML '09.

[30] Massimiliano Pontil,et al. Convex multi-task feature learning , 2008, Machine Learning.

[31] Trevor Darrell,et al. The Pyramid Match Kernel: Efficient Learning with Sets of Features , 2007, J. Mach. Learn. Res..

[32] Matthias W. Seeger,et al. Large Scale Variational Inference and Experimental Design for Sparse Generalized Linear Models , 2008, Sampling-based Optimization in the Presence of Uncertainty.

[33] David J. C. MacKay,et al. Bayesian Interpolation , 1992, Neural Computation.

[34] Junzhou Huang,et al. Learning with structured sparsity , 2009, ICML '09.

[35] Charles A. Micchelli,et al. Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[36] Mark J. F. Gales,et al. Combining Derivative and Parametric Kernels for Speaker Verification , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[37] Mehryar Mohri,et al. Generalization Bounds for Learning Kernels , 2010, ICML.

[38] Peter L. Bartlett,et al. A Unifying View of Multiple Kernel Learning , 2010, ECML/PKDD.

[39] Sebastian Nowozin,et al. Let the kernel figure it out; Principled learning of pre-processing for kernel classifiers , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[40] Theodoros Damoulas,et al. Probabilistic multi-class multi-kernel learning: on protein fold recognition and remote homology detection , 2008, Bioinform..

[41] Mehryar Mohri,et al. L2 Regularization for Learning Kernels , 2009, UAI.

[42] Cedric Archambeau,et al. Multiple Gaussian Process Models , 2011, 1110.5238.

[43] George Eastman House,et al. Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[44] Gunnar Rätsch,et al. Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..