Multiple Operator-valued Kernel Learning

Positive definite operator-valued kernels generalize the well-known notion of reproducing kernels, and are naturally adapted to multi-output learning situations. This paper addresses the problem of learning a finite linear combination of infinite-dimensional operator-valued kernels which are suitable for extending functional data analysis methods to nonlinear contexts. We study this problem in the case of kernel ridge regression for functional responses with an lr-norm constraint on the combination coefficients (r ≥ 1). The resulting optimization problem is more involved than those of multiple scalar-valued kernel learning since operator-valued kernels pose more technical and theoretical issues. We propose a multiple operator-valued kernel learning algorithm based on solving a system of linear operator equations by using a block coordinate-descent procedure. We experimentally validate our approach on a functional regression task in the context of finger movement prediction in brain-computer interfaces.

[1]  S. Canu,et al.  $\ell_{p}-\ell_{q}$ Penalty for Sparse Linear and Sparse Multiple Kernel Multitask Learning , 2011, IEEE Transactions on Neural Networks.

[2]  Philippe Preux,et al.  Functional Regularized Least Squares Classication with Operator-valued Kernels , 2011, ICML.

[3]  Peter V. Gehler,et al.  Learning Output Kernels with Block Coordinate Descent , 2011, ICML.

[4]  Florence d'Alché-Buc,et al.  Semi-supervised Penalized Output Kernel Regression for Link Prediction , 2011, ICML.

[5]  M. Kloft,et al.  l p -Norm Multiple Kernel Learning , 2011 .

[6]  Chiranjib Bhattacharyya,et al.  Variable Sparsity Kernel Learning , 2011, J. Mach. Learn. Res..

[7]  Mehryar Mohri,et al.  Generalization Bounds for Learning Kernels , 2010, ICML.

[8]  Sung Ha Kang,et al.  Image and Video Colorization Using Vector-Valued Reproducing Kernel Hilbert Spaces , 2010, Journal of Mathematical Imaging and Vision.

[9]  Stéphane Canu,et al.  Nonlinear functional regression: a functional RKHS approach , 2010, AISTATS.

[10]  Matthias Hein Robust Nonparametric Regression with Metric-Space Valued Output , 2009, NIPS.

[11]  Jieping Ye,et al.  BDIOCTL: Obligations and the Specification of Agent Behavior , 2009, IJCAI.

[12]  Mehryar Mohri,et al.  L2 Regularization for Learning Kernels , 2009, UAI.

[13]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[14]  Christoph H. Lampert,et al.  Learning to Localize Objects with Structured Output Regression , 2008, ECCV.

[15]  C. Carmeli,et al.  Vector valued reproducing kernel Hilbert spaces and universality , 2008, 0807.1659.

[16]  Paul Tseng,et al.  A coordinate gradient descent method for nonsmooth separable minimization , 2008, Math. Program..

[17]  Charles A. Micchelli,et al.  Universal Multi-Task Kernels , 2008, J. Mach. Learn. Res..

[18]  Andreas Schulze-Bonhage,et al.  Prediction of arm movement trajectories from ECoG-recordings in humans , 2008, Journal of Neuroscience Methods.

[19]  J. Wolpaw,et al.  Decoding two-dimensional movement trajectories using electrocorticographic signals in humans , 2007, Journal of neural engineering.

[20]  Francis R. Bach,et al.  Consistency of the group Lasso and multiple kernel learning , 2007, J. Mach. Learn. Res..

[21]  Cheng Soon Ong,et al.  Multiclass multiple kernel learning , 2007, ICML '07.

[22]  H. Lian Nonlinear functional models for functional responses in reproducing kernel hilbert spaces , 2007, math/0702120.

[23]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[24]  C. Carmeli,et al.  VECTOR VALUED REPRODUCING KERNEL HILBERT SPACES OF INTEGRABLE FUNCTIONS AND MERCER THEOREM , 2006 .

[25]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[26]  Charles A. Micchelli,et al.  Learning the Kernel Function via Regularization , 2005, J. Mach. Learn. Res..

[27]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[28]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[29]  N. Birbaumer,et al.  BCI2000: a general-purpose brain-computer interface (BCI) system , 2004, IEEE Transactions on Biomedical Engineering.

[30]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[31]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[32]  B. Silverman,et al.  Estimating the mean and covariance structure nonparametrically when the data are curves , 1991 .

[33]  S. Kurcyusz On the existence and nonexistence of Lagrange multipliers in Banach spaces , 1976 .

[34]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[35]  Alexander Zien,et al.  lp-Norm Multiple Kernel Learning , 2011, J. Mach. Learn. Res..

[36]  P. Bartlett,et al.  ` p-Norm Multiple Kernel Learning , 2008 .

[37]  C. Micchelli,et al.  Universal Multi-Task Kernels , 2008, J. Mach. Learn. Res..

[38]  Charles A. Micchelli,et al.  On Learning Vector-Valued Functions , 2005, Neural Computation.

[39]  M. Zabarankin,et al.  Convex functional analysis , 2005 .

[40]  Beyond—bernhard Schölkopf,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[41]  Alex Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[42]  G. Wahba Multivariate Function and Operator Estimation, Based on Smoothing Splines and Reproducing Kernels , 1992 .