A representer theorem for deep kernel learning

In this paper we provide a representer theorem for the concatenation of (linear combinations of) kernel functions of reproducing kernel Hilbert spaces. This result serves as mathematical foundation for the analysis of machine learning algorithms based on compositions of functions. As a direct consequence, the corresponding infinite-dimensional minimization problems can be recast into (nonlinear) finite-dimensional minimization problems, which can be tackled with nonlinear optimization algorithms. Moreover, we show how concatenated machine learning problems can be reformulated as neural networks and how our representer theorem applies to a broad class of state-of-the-art deep learning methods.

[1]  G. Wahba,et al.  A Correspondence Between Bayesian Estimation on Stochastic Processes and Smoothing by Splines , 1970 .

[2]  Francesco Dinuzzo Learning functions with kernel methods , 2011 .

[3]  Lawrence K. Saul,et al.  Kernel Methods for Deep Learning , 2009, NIPS.

[4]  Qi Ye,et al.  Reproducing kernels of generalized Sobolev spaces via a Green function approach with distributional operators , 2011, Numerische Mathematik.

[5]  Robert Schaback,et al.  Stability of kernel-based interpolation , 2010, Adv. Comput. Math..

[6]  Ivor W. Tsang,et al.  Two-Layer Multiple Kernel Learning , 2011, AISTATS.

[7]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[8]  A. Hinrichs,et al.  Stuttgart Fachbereich Mathematik Optimal quasi-Monte Carlo rules on higher order digital nets for the numerical integration of multivariate periodic functions , 2015 .

[9]  Tomaso A. Poggio,et al.  When and Why Are Deep Networks Better Than Shallow Ones? , 2017, AAAI.

[10]  Shyam Visweswaran,et al.  Deep Multiple Kernel Learning , 2013, 2013 12th International Conference on Machine Learning and Applications.

[11]  Andrew Gordon Wilson,et al.  Deep Kernel Learning , 2015, AISTATS.

[12]  H. Bungartz,et al.  Sparse grids , 2004, Acta Numerica.

[13]  Stéphane Mallat,et al.  Understanding deep convolutional networks , 2016, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[14]  Charles A. Micchelli,et al.  On Learning Vector-Valued Functions , 2005, Neural Computation.

[15]  Roberto Croce Numerische Simulation der Interaktion von inkompressiblen Zweiphasenströmungen mit Starrkörpern in drei Raumdimensionen , 2010 .

[16]  Michael Griebel,et al.  Approximation of bi-variate functions: singular value decomposition versus sparse grids , 2014 .

[17]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[18]  Neil D. Lawrence,et al.  Deep Gaussian Processes , 2012, AISTATS.

[19]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[20]  Walid Mahdi,et al.  Deep multilayer multiple kernel learning , 2016, Neural Computing and Applications.

[21]  Alexander J. Smola,et al.  Proximal Stochastic Methods for Nonsmooth Nonconvex Finite-Sum Optimization , 2016, NIPS.

[22]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[23]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[24]  Aicke Hinrichs,et al.  Optimal quasi-Monte Carlo rules on order 2 digital nets for the numerical integration of multivariate periodic functions , 2016, Numerische Mathematik.

[25]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[26]  Michael Griebel,et al.  Error Estimates for Multivariate Regression on Discretized Function Spaces , 2017, SIAM J. Numer. Anal..

[27]  Alexander Binder,et al.  Explaining nonlinear classification decisions with deep Taylor decomposition , 2015, Pattern Recognit..

[28]  M. Urner Scattered Data Approximation , 2016 .

[29]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..