Learning with Boundary Conditions

Kernel machines traditionally arise from an elegant formulation based on measuring the smoothness of the admissible solutions by the norm in the reproducing kernel Hilbert space (RKHS) generated by the chosen kernel. It was pointed out that they can be formulated in a related functional framework, in which the Green’s function of suitable differential operators is thought of as a kernel. In this letter, our own picture of this intriguing connection is given by emphasizing some relevant distinctions between these different ways of measuring the smoothness of admissible solutions. In particular, we show that for some kernels, there is no associated differential operator. The crucial relevance of boundary conditions is especially emphasized, which is in fact the truly distinguishing feature of the approach based on differential operators. We provide a general solution to the problem of learning from data and boundary conditions and illustrate the significant role played by boundary conditions with examples. It turns out that the degree of freedom that arises in the traditional formulation of kernel machines is indeed a limitation, which is partly overcome when incorporating the boundary conditions. This likely holds true in many real-world applications in which there is prior knowledge about the expected behavior of classifiers and regressors on the boundary.

[1]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[2]  M. Mitrea,et al.  The dirichlet problem in lipschitz domains for higher order elliptic systems with rough coefficients , 2007, math/0701898.

[3]  Marcello Sanguineti,et al.  Approximate Minimization of the Regularized Expected Error over Kernel Models , 2008, Math. Oper. Res..

[4]  Š. Schwabik,et al.  Topics In Banach Space Integration , 2005 .

[5]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[6]  D. S. Jones The theory of generalised functions: Table of Laplace transforms , 1982 .

[7]  A. Dontchev Perturbations, Approximations, and Sensitivity Analysis of Optimal Control Systems , 1983 .

[8]  Qi Ye,et al.  Reproducing kernels of Sobolev spaces via a green kernel approach with differential operators and boundary operators , 2011, Adv. Comput. Math..

[9]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[10]  Gregory E. Fasshauer,et al.  Green’s Functions: Taking Another Look at Kernel Approximation, RadialBasis Functions, and Splines , 2012 .

[11]  Giorgio C. Buttazzo,et al.  Variational Analysis in Sobolev and BV Spaces - Applications to PDEs and Optimization, Second Edition , 2014, MPS-SIAM series on optimization.

[12]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[13]  E. Bolinder The Fourier integral and its applications , 1963 .

[14]  C. DeWitt-Morette,et al.  Mathematical Analysis and Numerical Methods for Science and Technology , 1990 .

[15]  Giorgio Gnecco,et al.  Approximation and Estimation Bounds for Subsets of Reproducing Kernel Kreǐn Spaces , 2013, Neural Processing Letters.

[16]  Marcello Sanguineti,et al.  Regularization and Suboptimal Solutions in Learning from Data , 2009, Innovations in Neural Information Paradigms and Applications.

[17]  Jacques-Louis Lions,et al.  Functional and variational methods , 1988 .

[18]  Simon Haykin,et al.  On Different Facets of Regularization Theory , 2002, Neural Computation.

[19]  Dianhui Wang,et al.  Extreme learning machines: a survey , 2011, Int. J. Mach. Learn. Cybern..

[20]  I. Gavrilyuk Book Review: Variational analysis in Sobolev and BV spaces , 2007 .

[21]  Qi Ye Reproducing Kernels of Generalized Sobolev Spaces via a Green Function Approach with Differential Operators , 2011, 1109.0109.

[22]  J. Suykens,et al.  Imposing Symmetry in Least Squares Support Vector Machines Regression , 2005, Proceedings of the 44th IEEE Conference on Decision and Control.

[23]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[24]  Marcello Sanguineti,et al.  Learning With Mixed Hard/Soft Pointwise Constraints , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[25]  Bernhard Schölkopf,et al.  A kernel view of the dimensionality reduction of manifolds , 2004, ICML.

[26]  Benoît Frénay,et al.  Parameter-insensitive kernel in extreme learning for non-linear support vector regression , 2011, Neurocomputing.

[27]  Felipe Cucker,et al.  Learning Theory: An Approximation Theory Viewpoint (Cambridge Monographs on Applied & Computational Mathematics) , 2007 .

[28]  Marco Gori,et al.  Bridging logic and kernel machines , 2011, Machine Learning.

[29]  G. M.,et al.  Partial Differential Equations I , 2023, Applied Mathematical Sciences.

[30]  Marcello Sanguineti,et al.  Regularization Techniques and Suboptimal Solutions to Optimization Problems in Learning from Data , 2010, Neural Computation.

[31]  Bernhard Schölkopf,et al.  The connection between regularization operators and support vector kernels , 1998, Neural Networks.

[32]  Qi Ye,et al.  Reproducing kernels of generalized Sobolev spaces via a Green function approach with distributional operators , 2011, Numerische Mathematik.

[33]  Marcello Sanguineti,et al.  A theoretical framework for supervised learning from regions , 2014, Neurocomputing.

[34]  L. Hörmander Linear Partial Differential Operators , 1963 .

[35]  F. Trèves Basic Linear Partial Differential Equations , 1975 .

[36]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[37]  Marco Gori,et al.  Learning with Box Kernels , 2013, IEEE Trans. Pattern Anal. Mach. Intell..

[38]  Peter Wagner,et al.  A New Constructive Proof of the Malgrange-Ehrenpreis Theorem , 2009, Am. Math. Mon..

[39]  Marcello Sanguineti,et al.  Learning with generalization capability by kernel methods of bounded complexity , 2005, J. Complex..

[40]  L. Schwartz Théorie des distributions , 1966 .

[41]  Shahar Mendelson,et al.  A Few Notes on Statistical Learning Theory , 2002, Machine Learning Summer School.

[42]  L. Marton,et al.  Advances in Electronics and Electron Physics , 1958 .

[43]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[44]  Jacob T. Schwartz,et al.  Linear operators. Part II. Spectral theory , 2003 .

[45]  Sean McKee,et al.  Monte Carlo Methods for Applied Scientists , 2005 .

[46]  R. Cooke Real and Complex Analysis , 2011 .

[47]  V. Vladimirov Generalized functions in mathematical physics , 1979 .

[48]  Alan L. Yuille,et al.  The Motion Coherence Theory , 1988, [1988 Proceedings] Second International Conference on Computer Vision.

[49]  M. Bertero Linear Inverse and III-Posed Problems , 1989 .

[50]  Franco Scarselli,et al.  Are Multilayer Perceptrons Adequate for Pattern Recognition and Verification? , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[51]  Giorgio Gnecco,et al.  The weight-decay technique in learning from data: an optimization point of view , 2009, Comput. Manag. Sci..

[52]  G. Burton Sobolev Spaces , 2013 .

[53]  Jude W. Shavlik,et al.  Online Knowledge-Based Support Vector Machines , 2010, ECML/PKDD.

[54]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[55]  Garrett Birkhoff,et al.  The Numerical Solution of Elliptic Equations , 1987 .

[56]  N. Aronszajn,et al.  Theory of Bessel potentials. I , 1961 .

[57]  A Tikhonov,et al.  Solution of Incorrectly Formulated Problems and the Regularization Method , 1963 .

[58]  J. Powell Mathematical Methods in Physics , 1965 .

[59]  Mikhail Belkin,et al.  Semi-supervised Learning by Higher Order Regularization , 2011, AISTATS.

[60]  C. Berg,et al.  Harmonic Analysis on Semigroups , 1984 .

[61]  E. Parzen An Approach to Time Series Analysis , 1961 .

[62]  F. John Plane Waves and Spherical Means: Applied To Partial Differential Equations , 1981 .

[63]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[64]  A. Friedman Foundations of modern analysis , 1970 .

[65]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[66]  Tomaso A. Poggio,et al.  Extensions of a Theory of Networks for Approximation and Learning , 1990, NIPS.

[67]  M. Kon,et al.  APPROXIMATING FUNCTIONS IN REPRODUCING KERNEL HILBERT SPACES VIA STATISTICAL LEARNING THEORY , 2005 .

[68]  M. A. Jaswon Boundary Integral Equations , 1984 .

[69]  R. A. Silverman,et al.  Introductory Real Analysis , 1972 .

[70]  R. Strichartz A Guide to Distribution Theory and Fourier Transforms , 1994 .

[71]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[72]  W. Rudin Real and complex analysis, 3rd ed. , 1987 .

[73]  Felipe Cucker,et al.  Learning Theory: An Approximation Theory Viewpoint: On the bias–variance problem , 2007 .

[74]  J. March Introduction to the Calculus of Variations , 1999 .

[75]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[76]  Demetri Terzopoulos,et al.  Regularization of Inverse Visual Problems Involving Discontinuities , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[77]  Michael Taylor,et al.  Partial Differential Equations I: Basic Theory , 1996 .

[78]  Marcello Sanguineti,et al.  Foundations of Support Constraint Machines , 2015, Neural Computation.

[79]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[80]  D. A. Dunnett Classical Electrodynamics , 2020, Nature.

[81]  Federico Girosi,et al.  An Equivalence Between Sparse Approximation and Support Vector Machines , 1998, Neural Computation.

[82]  G. Wahba Spline models for observational data , 1990 .