论文信息 - Multi-Kernel Regression with Sparsity Constraint

Multi-Kernel Regression with Sparsity Constraint

In this paper, we provide a Banach-space formulation of supervised learning with generalized total-variation (gTV) regularization. We identify the class of kernel functions that are admissible in this framework. Then, we propose a variation of supervised learning in a continuous-domain hybrid search space with gTV regularization. We show that the solution admits a multi-kernel expansion with adaptive positions. In this representation, the number of active kernels is upper-bounded by the number of data points while the gTV regularization imposes an $\ell_1$ penalty on the kernel coefficients. Finally, we illustrate numerically the outcome of our theory.

Michael Unser | Shayan Aziznejad

[1] Umberto Castellani,et al. Multiple kernel learning , 2009 .

[2] Charles A. Micchelli,et al. Kernels for Multi--task Learning , 2004, NIPS.

[3] Ethem Alpaydin,et al. Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[4] Tsuyoshi Murata,et al. {m , 1934, ACML.

[5] Andreas Christmann,et al. Sparsity of SVMs that use the epsilon-insensitive loss , 2008, NIPS.

[6] László Györfi,et al. A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[7] Emmanuel J. Candès,et al. Super-Resolution from Noisy Data , 2012, Journal of Fourier Analysis and Applications.

[8] Qi Ye,et al. Reproducing kernels of generalized Sobolev spaces via a Green function approach with distributional operators , 2011, Numerische Mathematik.

[9] Peter L. Bartlett,et al. A Unifying View of Multiple Kernel Learning , 2010, ECML/PKDD.

[10] G. Wahba. Spline models for observational data , 1990 .

[11] N. Aronszajn,et al. Theory of Bessel potentials. I , 1961 .

[12] N. Aronszajn. Theory of Reproducing Kernels. , 1950 .

[13] Ingo Steinwart,et al. Sparseness of Support Vector Machines , 2003, J. Mach. Learn. Res..

[14] C. D. Boor,et al. On splines and their minimum properties , 1966 .

[15] Charles A. Micchelli,et al. On Learning Vector-Valued Functions , 2005, Neural Computation.

[16] W. Rudin. Real and complex analysis , 1968 .

[17] Michael Unser,et al. Pocket guide to solve inverse problems with GlobalBioIm , 2018, Inverse Problems.

[18] Yohann de Castro,et al. Exact Reconstruction using Beurling Minimal Extrapolation , 2011, 1103.4951.

[19] Daming Shi,et al. Sparse kernel learning with LASSO and Bayesian inference algorithm , 2010, Neural Networks.

[20] Nello Cristianini,et al. Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[21] Mário A. T. Figueiredo,et al. Gradient Projection for Sparse Reconstruction: Application to Compressed Sensing and Other Inverse Problems , 2007, IEEE Journal of Selected Topics in Signal Processing.

[22] Tomaso A. Poggio,et al. Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[23] Shaun Nichols. Priors , 2021, Rational Rules.

[24] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[25] Alexander J. Smola,et al. Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[26] Nello Cristianini,et al. Kernel Methods for Pattern Analysis , 2003, ICTAI.

[27] M. Zabarankin,et al. Convex functional analysis , 2005 .

[28] Bernhard Schölkopf,et al. The connection between regularization operators and support vector kernels , 1998, Neural Networks.

[29] Andreas Christmann,et al. Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[31] Don R. Hush,et al. An Explicit Description of the Reproducing Kernel Hilbert Spaces of Gaussian RBF Kernels , 2006, IEEE Transactions on Information Theory.

[32] Michael Unser,et al. An Introduction to Sparse Stochastic Processes , 2014 .

[33] Ding-Xuan Zhou,et al. Concentration estimates for learning with ℓ1-regularizer and data dependent hypothesis spaces , 2011 .

[34] M. Kloft,et al. l p -Norm Multiple Kernel Learning , 2011 .

[35] Joseph W. Jerome,et al. Spline solutions to L1 extremal problems in one and several variables , 1975 .

[36] Michael Unser,et al. Splines Are Universal Solutions of Linear Inverse Problems with Generalized TV Regularization , 2016, SIAM Rev..

[37] Nasser M. Nasrabadi,et al. Pattern Recognition and Machine Learning , 2006, Technometrics.

[38] Anthony Widjaja,et al. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[39] B. Simon. Distributions and Their Hermite Expansions , 1971 .

[40] Ken-iti Sato. Lévy Processes and Infinitely Divisible Distributions , 1999 .

[41] Yuesheng Xu,et al. Reproducing kernel Banach spaces for machine learning , 2009, 2009 International Joint Conference on Neural Networks.

[42] Marius Kloft,et al. Efficient and Accurate ` p-Norm Multiple Kernel Learning , 2009 .

[43] Ding-Xuan Zhou,et al. An approximation theory approach to learning with ℓ1 regularization , 2013, J. Approx. Theory.

[44] Ingo Steinwart,et al. Optimal regression rates for SVMs using Gaussian kernels , 2013 .

[45] A. Caponnetto,et al. Optimal Rates for the Regularized Least-Squares Algorithm , 2007, Found. Comput. Math..

[46] Bernhard Schölkopf,et al. A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[47] Emmanuel Soubies,et al. The sliding Frank–Wolfe algorithm and its application to super-resolution microscopy , 2018, Inverse Problems.

[48] S. Geer,et al. Locally adaptive regression splines , 1997 .

[49] Ingo Steinwart,et al. Sparseness of Support Vector Machines---Some Asymptotically Sharp Bounds , 2003, NIPS.

[50] F. J. Hickernell,et al. Solving Support Vector Machines in Reproducing Kernel Banach Spaces with Positive Definite Functions , 2012, 1209.1171.

[51] Georgios B. Giannakis,et al. Nonparametric Basis Pursuit via Sparse Kernel-Based Learning: A Unifying View with Advances in Blind Methods , 2013, IEEE Signal Processing Magazine.

[52] Michael I. Jordan,et al. Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[53] K. Bredies,et al. Inverse problems in spaces of measures , 2013 .

[54] Gabriel Peyré,et al. Exact Support Recovery for Sparse Spikes Deconvolution , 2013, Foundations of Computational Mathematics.

[55] 곽순섭,et al. Generalized Functions , 2006, Theoretical and Mathematical Physics.

[56] S. Mendelson,et al. Regularization in kernel learning , 2010, 1001.2094.

[57] Francis R. Bach,et al. Breaking the Curse of Dimensionality with Convex Neural Networks , 2014, J. Mach. Learn. Res..

[58] Sebastian Thrun,et al. Clustering Learning Tasks and the Selective Cross-Task Transfer of Knowledge , 1998, Learning to Learn.

[59] J. Burbea,et al. Banach and Hilbert spaces of vector-valued functions: Their general theory and applications to holomorphy , 1984 .

[60] Don R. Hush,et al. Optimal Rates for Regularized Least Squares Regression , 2009, COLT.

[61] Haizhang Zhang,et al. Regularized learning in Banach spaces as an optimization problem: representer theorems , 2012, J. Glob. Optim..

[62] Hyunjoong Kim,et al. Functional Analysis I , 2017 .

[63] Trevor Hastie,et al. Overview of Supervised Learning , 2001 .

[64] Charles A. Micchelli,et al. Learning the Kernel Function via Regularization , 2005, J. Mach. Learn. Res..

[65] I. Daubechies,et al. Iteratively reweighted least squares minimization for sparse recovery , 2008, 0807.0575.

[66] F. Girosi,et al. From regularization to radial, tensor and additive splines , 1993, Proceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan).

[67] Michael Unser,et al. B-Spline-Based Exact Discretization of Continuous-Domain Inverse Problems With Generalized TV Regularization , 2019, IEEE Transactions on Information Theory.

[68] Tom Heskes,et al. Task Clustering and Gating for Bayesian Multitask Learning , 2003, J. Mach. Learn. Res..

[69] A. Berlinet,et al. Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[70] Klaus-Robert Müller,et al. Efficient and Accurate Lp-Norm Multiple Kernel Learning , 2009, NIPS.

[71] Francis R. Bach,et al. Consistency of the group Lasso and multiple kernel learning , 2007, J. Mach. Learn. Res..

[72] L. Schwartz. Théorie des distributions , 1966 .

[73] Marc Teboulle,et al. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[74] Adam Krzyzak,et al. A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[75] Rich Caruana,et al. Multitask Learning , 1997, Machine Learning.

[76] Alan L. Yuille,et al. The Motion Coherence Theory , 1988, [1988 Proceedings] Second International Conference on Computer Vision.

[77] Volker Roth,et al. The generalized LASSO , 2004, IEEE Transactions on Neural Networks.

[78] G. Wahba,et al. Some results on Tchebycheffian spline functions , 1971 .

[79] Gabriel Peyré,et al. Support Recovery for Sparse Super-Resolution of Positive Measures , 2017 .

[80] Jean Duchon,et al. Splines minimizing rotation-invariant semi-norms in Sobolev spaces , 1976, Constructive Theory of Functions of Several Variables.

[81] A Tikhonov,et al. Solution of Incorrectly Formulated Problems and the Regularization Method , 1963 .

[82] Ding-Xuan Zhou,et al. Learning Theory: An Approximation Theory Viewpoint , 2007 .

[83] Michael Unser,et al. Continuous-Domain Solutions of Linear Inverse Problems With Tikhonov Versus Generalized TV Regularization , 2018, IEEE Transactions on Signal Processing.