An L1 Representer Theorem for Multiple-Kernel Regression

The theory of RKHS provides an elegant framework for supervised learning. It is the foundation of all kernel methods in machine learning. Implicit in its formulation is the use of a quadratic regularizer associated with the underlying inner product which imposes smoothness constraints. In this paper, we consider instead the generalized total-variation (gTV) norm as the sparsity-promoting regularizer. This leads us to propose a new Banach-space framework that justifies the use of generalized LASSO, albeit in a slightly modified version. We prove a representer theorem for multiple-kernel regression (MKR) with gTV regularization. The theorem states that the solutions of MKR have kernel expansions with adaptive positions, while the gTV norm enforces an $\ell_1$ penalty on the coefficients. We discuss the sparsity-promoting effect of the gTV norm which prevents redundancy in the multiple-kernel scenario.

[1]  N. Aronszajn,et al.  Theory of Bessel potentials. I , 1961 .

[2]  Jonathan Baxter,et al.  A Bayesian/Information Theoretic Model of Learning to Learn via Multiple Task Sampling , 1997, Machine Learning.

[3]  A Tikhonov,et al.  Solution of Incorrectly Formulated Problems and the Regularization Method , 1963 .

[4]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[5]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[6]  Klaus-Robert Müller,et al.  Efficient and Accurate Lp-Norm Multiple Kernel Learning , 2009, NIPS.

[7]  Charles A. Micchelli,et al.  Kernels for Multi--task Learning , 2004, NIPS.

[8]  佐藤 健一 Lévy processes and infinitely divisible distributions , 2013 .

[9]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[10]  G. Shilov,et al.  Generalized Functions, Volume 1: Properties and Operations , 1967 .

[11]  M. Zabarankin,et al.  Convex functional analysis , 2005 .

[12]  Georgios B. Giannakis,et al.  Nonparametric Basis Pursuit via Sparse Kernel-Based Learning: A Unifying View with Advances in Blind Methods , 2013, IEEE Signal Processing Magazine.

[13]  Charles A. Micchelli,et al.  On Learning Vector-Valued Functions , 2005, Neural Computation.

[14]  Michael Unser,et al.  Splines Are Universal Solutions of Linear Inverse Problems with Generalized TV Regularization , 2016, SIAM Rev..

[15]  Daming Shi,et al.  Sparse kernel learning with LASSO and Bayesian inference algorithm , 2010, Neural Networks.

[16]  Trevor Hastie,et al.  Overview of Supervised Learning , 2001 .

[17]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[18]  Sebastian Thrun,et al.  Clustering Learning Tasks and the Selective Cross-Task Transfer of Knowledge , 1998, Learning to Learn.

[19]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[20]  Michael Unser,et al.  Continuous-Domain Solutions of Linear Inverse Problems With Tikhonov Versus Generalized TV Regularization , 2018, IEEE Transactions on Signal Processing.

[21]  Michael Unser,et al.  An Introduction to Sparse Stochastic Processes , 2014 .

[22]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[23]  Volker Roth,et al.  The generalized LASSO , 2004, IEEE Transactions on Neural Networks.

[24]  M. Kloft,et al.  Efficient and Accurate ` p-Norm Multiple Kernel Learning , 2009 .

[25]  Charles A. Micchelli,et al.  Learning the Kernel Function via Regularization , 2005, J. Mach. Learn. Res..

[26]  Bernhard Schölkopf,et al.  Kernels, regularization and differential equations , 2008, Pattern Recognit..

[27]  Tom Heskes,et al.  Task Clustering and Gating for Bayesian Multitask Learning , 2003, J. Mach. Learn. Res..