Multi-Kernel Regression with Sparsity Constraint

In this paper, we provide a Banach-space formulation of supervised learning with generalized total-variation (gTV) regularization. We identify the class of kernel functions that are admissible in this framework. Then, we propose a variation of supervised learning in a continuous-domain hybrid search space with gTV regularization. We show that the solution admits a multi-kernel expansion with adaptive positions. In this representation, the number of active kernels is upper-bounded by the number of data points while the gTV regularization imposes an $\ell_1$ penalty on the kernel coefficients. Finally, we illustrate numerically the outcome of our theory.

[1]  Umberto Castellani,et al.  Multiple kernel learning , 2009 .

[2]  Charles A. Micchelli,et al.  Kernels for Multi--task Learning , 2004, NIPS.

[3]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[4]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[5]  Andreas Christmann,et al.  Sparsity of SVMs that use the epsilon-insensitive loss , 2008, NIPS.

[6]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[7]  Emmanuel J. Candès,et al.  Super-Resolution from Noisy Data , 2012, Journal of Fourier Analysis and Applications.

[8]  Qi Ye,et al.  Reproducing kernels of generalized Sobolev spaces via a Green function approach with distributional operators , 2011, Numerische Mathematik.

[9]  Peter L. Bartlett,et al.  A Unifying View of Multiple Kernel Learning , 2010, ECML/PKDD.

[10]  G. Wahba Spline models for observational data , 1990 .

[11]  N. Aronszajn,et al.  Theory of Bessel potentials. I , 1961 .

[12]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[13]  Ingo Steinwart,et al.  Sparseness of Support Vector Machines , 2003, J. Mach. Learn. Res..

[14]  C. D. Boor,et al.  On splines and their minimum properties , 1966 .

[15]  Charles A. Micchelli,et al.  On Learning Vector-Valued Functions , 2005, Neural Computation.

[16]  W. Rudin Real and complex analysis , 1968 .

[17]  Michael Unser,et al.  Pocket guide to solve inverse problems with GlobalBioIm , 2018, Inverse Problems.

[18]  Yohann de Castro,et al.  Exact Reconstruction using Beurling Minimal Extrapolation , 2011, 1103.4951.

[19]  Daming Shi,et al.  Sparse kernel learning with LASSO and Bayesian inference algorithm , 2010, Neural Networks.

[20]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[21]  Mário A. T. Figueiredo,et al.  Gradient Projection for Sparse Reconstruction: Application to Compressed Sensing and Other Inverse Problems , 2007, IEEE Journal of Selected Topics in Signal Processing.

[22]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[23]  Shaun Nichols Priors , 2021, Rational Rules.

[24]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[25]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[26]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[27]  M. Zabarankin,et al.  Convex functional analysis , 2005 .

[28]  Bernhard Schölkopf,et al.  The connection between regularization operators and support vector kernels , 1998, Neural Networks.

[29]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[31]  Don R. Hush,et al.  An Explicit Description of the Reproducing Kernel Hilbert Spaces of Gaussian RBF Kernels , 2006, IEEE Transactions on Information Theory.

[32]  Michael Unser,et al.  An Introduction to Sparse Stochastic Processes , 2014 .

[33]  Ding-Xuan Zhou,et al.  Concentration estimates for learning with ℓ1-regularizer and data dependent hypothesis spaces , 2011 .

[34]  M. Kloft,et al.  l p -Norm Multiple Kernel Learning , 2011 .

[35]  Joseph W. Jerome,et al.  Spline solutions to L1 extremal problems in one and several variables , 1975 .

[36]  Michael Unser,et al.  Splines Are Universal Solutions of Linear Inverse Problems with Generalized TV Regularization , 2016, SIAM Rev..

[37]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[38]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[39]  B. Simon Distributions and Their Hermite Expansions , 1971 .

[40]  Ken-iti Sato Lévy Processes and Infinitely Divisible Distributions , 1999 .

[41]  Yuesheng Xu,et al.  Reproducing kernel Banach spaces for machine learning , 2009, 2009 International Joint Conference on Neural Networks.

[42]  Marius Kloft,et al.  Efficient and Accurate ` p-Norm Multiple Kernel Learning , 2009 .

[43]  Ding-Xuan Zhou,et al.  An approximation theory approach to learning with ℓ1 regularization , 2013, J. Approx. Theory.

[44]  Ingo Steinwart,et al.  Optimal regression rates for SVMs using Gaussian kernels , 2013 .

[45]  A. Caponnetto,et al.  Optimal Rates for the Regularized Least-Squares Algorithm , 2007, Found. Comput. Math..

[46]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[47]  Emmanuel Soubies,et al.  The sliding Frank–Wolfe algorithm and its application to super-resolution microscopy , 2018, Inverse Problems.

[48]  S. Geer,et al.  Locally adaptive regression splines , 1997 .

[49]  Ingo Steinwart,et al.  Sparseness of Support Vector Machines---Some Asymptotically Sharp Bounds , 2003, NIPS.

[50]  F. J. Hickernell,et al.  Solving Support Vector Machines in Reproducing Kernel Banach Spaces with Positive Definite Functions , 2012, 1209.1171.

[51]  Georgios B. Giannakis,et al.  Nonparametric Basis Pursuit via Sparse Kernel-Based Learning: A Unifying View with Advances in Blind Methods , 2013, IEEE Signal Processing Magazine.

[52]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[53]  K. Bredies,et al.  Inverse problems in spaces of measures , 2013 .

[54]  Gabriel Peyré,et al.  Exact Support Recovery for Sparse Spikes Deconvolution , 2013, Foundations of Computational Mathematics.

[55]  곽순섭,et al.  Generalized Functions , 2006, Theoretical and Mathematical Physics.

[56]  S. Mendelson,et al.  Regularization in kernel learning , 2010, 1001.2094.

[57]  Francis R. Bach,et al.  Breaking the Curse of Dimensionality with Convex Neural Networks , 2014, J. Mach. Learn. Res..

[58]  Sebastian Thrun,et al.  Clustering Learning Tasks and the Selective Cross-Task Transfer of Knowledge , 1998, Learning to Learn.

[59]  J. Burbea,et al.  Banach and Hilbert spaces of vector-valued functions: Their general theory and applications to holomorphy , 1984 .

[60]  Don R. Hush,et al.  Optimal Rates for Regularized Least Squares Regression , 2009, COLT.

[61]  Haizhang Zhang,et al.  Regularized learning in Banach spaces as an optimization problem: representer theorems , 2012, J. Glob. Optim..

[62]  Hyunjoong Kim,et al.  Functional Analysis I , 2017 .

[63]  Trevor Hastie,et al.  Overview of Supervised Learning , 2001 .

[64]  Charles A. Micchelli,et al.  Learning the Kernel Function via Regularization , 2005, J. Mach. Learn. Res..

[65]  I. Daubechies,et al.  Iteratively reweighted least squares minimization for sparse recovery , 2008, 0807.0575.

[66]  F. Girosi,et al.  From regularization to radial, tensor and additive splines , 1993, Proceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan).

[67]  Michael Unser,et al.  B-Spline-Based Exact Discretization of Continuous-Domain Inverse Problems With Generalized TV Regularization , 2019, IEEE Transactions on Information Theory.

[68]  Tom Heskes,et al.  Task Clustering and Gating for Bayesian Multitask Learning , 2003, J. Mach. Learn. Res..

[69]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[70]  Klaus-Robert Müller,et al.  Efficient and Accurate Lp-Norm Multiple Kernel Learning , 2009, NIPS.

[71]  Francis R. Bach,et al.  Consistency of the group Lasso and multiple kernel learning , 2007, J. Mach. Learn. Res..

[72]  L. Schwartz Théorie des distributions , 1966 .

[73]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[74]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[75]  Rich Caruana,et al.  Multitask Learning , 1997, Machine Learning.

[76]  Alan L. Yuille,et al.  The Motion Coherence Theory , 1988, [1988 Proceedings] Second International Conference on Computer Vision.

[77]  Volker Roth,et al.  The generalized LASSO , 2004, IEEE Transactions on Neural Networks.

[78]  G. Wahba,et al.  Some results on Tchebycheffian spline functions , 1971 .

[79]  Gabriel Peyré,et al.  Support Recovery for Sparse Super-Resolution of Positive Measures , 2017 .

[80]  Jean Duchon,et al.  Splines minimizing rotation-invariant semi-norms in Sobolev spaces , 1976, Constructive Theory of Functions of Several Variables.

[81]  A Tikhonov,et al.  Solution of Incorrectly Formulated Problems and the Regularization Method , 1963 .

[82]  Ding-Xuan Zhou,et al.  Learning Theory: An Approximation Theory Viewpoint , 2007 .

[83]  Michael Unser,et al.  Continuous-Domain Solutions of Linear Inverse Problems With Tikhonov Versus Generalized TV Regularization , 2018, IEEE Transactions on Signal Processing.