A theoretical framework for supervised learning from regions

Supervised learning is investigated, when the data are represented not only by labeled points but also labeled regions of the input space. In the limit case, such regions degenerate to single points and the proposed approach changes back to the classical learning context. The adopted framework entails the minimization of a functional obtained by introducing a loss function that involves such regions. An additive regularization term is expressed via differential operators that model the smoothness properties of the desired input/output relationship. Representer theorems are given, proving that the optimization problem associated to learning from labeled regions has a unique solution, which takes on the form of a linear combination of kernel functions determined by the differential operators together with the regions themselves. As a relevant situation, the case of regions given by multi-dimensional intervals (i.e., ''boxes'') is investigated, which models prior knowledge expressed by logical propositions.

[1]  Qi Ye Reproducing Kernels of Generalized Sobolev Spaces via a Green Function Approach with Differential Operators , 2011, 1109.0109.

[2]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[3]  Giorgio C. Buttazzo,et al.  Variational Analysis in Sobolev and BV Spaces - Applications to PDEs and Optimization, Second Edition , 2014, MPS-SIAM series on optimization.

[4]  Olvi L. Mangasarian,et al.  Nonlinear Knowledge-Based Classification , 2008, IEEE Transactions on Neural Networks.

[5]  Qi Ye,et al.  Reproducing kernels of generalized Sobolev spaces via a Green function approach with distributional operators , 2011, Numerische Mathematik.

[6]  I. J. Schoenberg Metric spaces and completely monotone functions , 1938 .

[7]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[8]  G. Burton Sobolev Spaces , 2013 .

[9]  C. Berg,et al.  Harmonic Analysis on Semigroups , 1984 .

[10]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[11]  Bernhard Schölkopf,et al.  From Regularization Operators to Support Vector Kernels , 1997, NIPS.

[12]  E. Parzen An Approach to Time Series Analysis , 1961 .

[13]  Ivan P. Gavrilyuk,et al.  Variational analysis in Sobolev and BV spaces , 2007, Math. Comput..

[14]  Tomaso A. Poggio,et al.  Extensions of a Theory of Networks for Approximation and Learning , 1990, NIPS.

[15]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[16]  Glenn Fung,et al.  Knowledge-Based Nonlinear Kernel Classifiers , 2003, COLT.

[17]  Shahla Molahajloo,et al.  Pseudo-Differential Operators on ℤ , 2009 .

[18]  A. Friedman Foundations of modern analysis , 1970 .

[19]  Federico Girosi,et al.  An Equivalence Between Sparse Approximation and Support Vector Machines , 1998, Neural Computation.

[20]  G. Wahba Spline models for observational data , 1990 .

[21]  Marco Gori,et al.  Learning with Box Kernels , 2013, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  I. Gavrilyuk Book Review: Variational analysis in Sobolev and BV spaces , 2007 .

[23]  Marcello Sanguineti,et al.  Learning with generalization capability by kernel methods of bounded complexity , 2005, J. Complex..

[24]  Simon Haykin,et al.  On Different Facets of Regularization Theory , 2002, Neural Computation.

[25]  Glenn Fung,et al.  Knowledge-Based Support Vector Machine Classifiers , 2002, NIPS.

[26]  Marcello Sanguineti,et al.  Learning with Boundary Conditions , 2013, Neural Computation.

[27]  A Tikhonov,et al.  Solution of Incorrectly Formulated Problems and the Regularization Method , 1963 .

[28]  Gérard Bloch,et al.  Incorporating prior knowledge in support vector machines for classification: A review , 2008, Neurocomputing.

[29]  Tomaso A. Poggio,et al.  Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[30]  Thomas Gärtner,et al.  Simpler knowledge-based support vector machines , 2006, ICML.

[31]  Glenn Fung,et al.  Proximal Knowledge‐based Classification , 2009, Stat. Anal. Data Min..

[32]  A. Dontchev Perturbations, Approximations, and Sensitivity Analysis of Optimal Control Systems , 1983 .

[33]  Giorgio Gnecco,et al.  The weight-decay technique in learning from data: an optimization point of view , 2009, Comput. Manag. Sci..

[34]  Marcello Sanguineti,et al.  Error Estimates for Approximate Optimization by the Extended Ritz Method , 2005, SIAM J. Optim..

[35]  Marcello Sanguineti,et al.  Regularization Techniques and Suboptimal Solutions to Optimization Problems in Learning from Data , 2010, Neural Computation.

[36]  L. Schwartz Théorie des distributions , 1966 .