Foundations of Support Constraint Machines

The mathematical foundations of a new theory for the design of intelligent agents are presented. The proposed learning paradigm is centered around the concept of constraint, representing the interactions with the environment, and the parsimony principle. The classical regularization framework of kernel machines is naturally extended to the case in which the agents interact with a richer environment, where abstract granules of knowledge, compactly described by different linguistic formalisms, can be translated into the unified notion of constraint for defining the hypothesis set. Constrained variational calculus is exploited to derive general representation theorems that provide a description of the optimal body of the agent (i.e., the functional structure of the optimal solution to the learning problem), which is the basis for devising new learning algorithms. We show that regardless of the kind of constraints, the optimal body of the agent is a support constraint machine (SCM) based on representer theorems that extend classical results for kernel machines and provide new representations. In a sense, the expressiveness of constraints yields a semantic-based regularization theory, which strongly restricts the hypothesis set of classical regularization. Some guidelines to unify continuous and discrete computational mechanisms are given so as to accommodate in the same framework various kinds of stimuli, for example, supervised examples and logic predicates. The proposed view of learning from constraints incorporates classical learning from examples and extends naturally to the case in which the examples are subsets of the input space, which is related to learning propositional logic clauses.

[1]  Marco Gori,et al.  Integrating Logic Knowledge into Graph Regularization : an application to image tagging , 2011 .

[2]  Marco Gori,et al.  Multitask Kernel-based Learning with Logic Constraints , 2010, ECAI.

[3]  Bernhard Schölkopf,et al.  The representer theorem for Hilbert spaces: a necessary and sufficient condition , 2012, NIPS.

[4]  E. Kreyszig Introductory Functional Analysis With Applications , 1978 .

[5]  Rich Caruana,et al.  Multitask Learning , 1997, Machine Learning.

[6]  Tomaso A. Poggio,et al.  Extensions of a Theory of Networks for Approximation and Learning , 1990, NIPS.

[7]  Giorgio C. Buttazzo,et al.  Variational Analysis in Sobolev and BV Spaces - Applications to PDEs and Optimization, Second Edition , 2014, MPS-SIAM series on optimization.

[8]  Simon Haykin,et al.  On Different Facets of Regularization Theory , 2002, Neural Computation.

[9]  Marco Gori,et al.  Semi-supervised Learning with Constraints for Multi-view Object Recognition , 2009, ICANN.

[10]  Marco Gori,et al.  Learning with Convex Constraints , 2010, ICANN.

[11]  Stefan Hildebrandt,et al.  Partial Differential Equations and Calculus of Variations , 1989 .

[12]  G. Gnecco,et al.  Approximation Error Bounds via Rademacher's Complexity , 2008 .

[13]  Marcello Sanguineti,et al.  Learning with Boundary Conditions , 2013, Neural Computation.

[14]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[15]  Marco Gori,et al.  Bridging logic and kernel machines , 2011, Machine Learning.

[16]  Aaron Sloman,et al.  Some Requirements for Human-Like Robots: Why the Recent Over-Emphasis on Embodiment Has Held Up Progress , 2009, Creating Brain-Like Intelligence.

[17]  Charles A. Micchelli,et al.  On Learning Vector-Valued Functions , 2005, Neural Computation.

[18]  M. Gori,et al.  Inference , Learning , and Laws of Nature , 2013 .

[19]  Marco Gori Semantic-based regularization and Piaget's cognitive stages , 2009, Neural Networks.

[20]  Alan L. Yuille,et al.  The Motion Coherence Theory , 1988, [1988 Proceedings] Second International Conference on Computer Vision.

[21]  Faming Liang,et al.  Statistical and Computational Inverse Problems , 2006, Technometrics.

[22]  Marco Gori,et al.  Information Theoretic Learning for Pixel-Based Visual Agents , 2012, ECCV.

[23]  Qi Ye,et al.  Reproducing kernels of generalized Sobolev spaces via a Green function approach with distributional operators , 2011, Numerische Mathematik.

[24]  D. Luenberger Optimization by Vector Space Methods , 1968 .

[25]  Marco Gori,et al.  Semi-Supervised Multiclass Kernel Machines with Probabilistic Constraints , 2011, AI*IA.

[26]  Georgios Dounias,et al.  Evolving rule-based systems in two medical domains using genetic programming , 2004, Artif. Intell. Medicine.

[27]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[29]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[30]  R. H. Walters The Growth of Logical Thinking from Childhood to Adolescence , 1960 .

[31]  J. Cooper SINGULAR INTEGRALS AND DIFFERENTIABILITY PROPERTIES OF FUNCTIONS , 1973 .

[32]  Tomaso A. Poggio,et al.  Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[33]  B. Brunt The calculus of variations , 2003 .

[34]  G. Wahba Smoothing noisy data with spline functions , 1975 .

[35]  Peter Craven,et al.  Smoothing noisy data with spline functions , 1978 .

[36]  J. Hadamard Sur les problemes aux derive espartielles et leur signification physique , 1902 .

[37]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[38]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[39]  Marcello Sanguineti,et al.  Learning with Hard Constraints , 2013, ICANN.

[40]  Marco Gori,et al.  Variational Foundations of Online Backpropagation , 2013, ICANN.

[41]  Marco Gori,et al.  Constraint Verification With Kernel Machines , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[42]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[43]  Bernhard Schölkopf,et al.  From Regularization Operators to Support Vector Kernels , 1997, NIPS.

[44]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[45]  I︠u︡. A. Dubinskiĭ Sobolev Spaces of Infinite Order and Differential Equations , 1986 .

[46]  A. Kolmogorov Three approaches to the quantitative definition of information , 1968 .

[47]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[48]  Shahar Mendelson,et al.  A Few Notes on Statistical Learning Theory , 2002, Machine Learning Summer School.

[49]  Marco Gori,et al.  Learning to Tag Text from Rules and Examples , 2011, AI*IA.

[50]  Massimiliano Pontil,et al.  Multi-task Learning , 2020, Transfer Learning.

[51]  J. Basdevant,et al.  Variational Principles in Physics , 2006 .

[52]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[53]  Mikhail Belkin,et al.  Laplacian Support Vector Machines Trained in the Primal , 2009, J. Mach. Learn. Res..

[54]  Marco Gori,et al.  Learning to Tag from Logic Constraints in Hyperlinked Environments , 2011, 2011 10th International Conference on Machine Learning and Applications and Workshops.

[55]  G. Gnecco,et al.  Estimates of the Approximation Error Using Rademacher Complexity: Learning Vector-Valued Functions , 2008 .

[56]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[57]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[58]  L. Ė. Ėlʹsgolʹt︠s︡ Differential equations and the calculus of variations , 1970 .

[59]  W. D. Evans,et al.  PARTIAL DIFFERENTIAL EQUATIONS , 1941 .

[60]  G. Wahba Spline models for observational data , 1990 .

[61]  Alan L. Yuille,et al.  A mathematical analysis of the motion coherence theory , 1989, International Journal of Computer Vision.

[62]  David L. Goodstein,et al.  Genius: The Life and Science of Richard Feynman, James Gleick. 1992. Pantheon Press, New York, NY. 532 pages. ISBN: 0-679-40836-3. $27.50 , 1994 .

[63]  Frank Guerin,et al.  Constructivism in AI: Prospects, Progress and Challenges , 2008, AISB Convention.

[64]  Ralf Herbrich,et al.  Algorithmic Luckiness , 2001, J. Mach. Learn. Res..

[65]  Charles A. Micchelli,et al.  A Spectral Regularization Framework for Multi-Task Structure Learning , 2007, NIPS.

[66]  Luc De Raedt,et al.  Probabilistic Inductive Logic Programming , 2004, Probabilistic Inductive Logic Programming.

[67]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[68]  L. Schwartz Théorie des distributions , 1966 .

[69]  Bernhard Schölkopf,et al.  The connection between regularization operators and support vector kernels , 1998, Neural Networks.

[70]  Marcello Sanguineti,et al.  A theoretical framework for supervised learning from regions , 2014, Neurocomputing.

[71]  Marco Gori,et al.  Learning with Box Kernels , 2013, IEEE Trans. Pattern Anal. Mach. Intell..

[72]  D. F. Hays,et al.  Table of Integrals, Series, and Products , 1966 .

[73]  Marco Gori,et al.  Constraint-based Learning for Text Categorization , 2012 .

[74]  Stefano Teso,et al.  Improved multi-level protein–protein interaction prediction with semantic-based regularization , 2014, BMC Bioinformatics.

[75]  I. S. Gradshteyn Table of Integrals, Series and Products, Corrected and Enlarged Edition , 1980 .

[76]  Gregory J. Chaitin,et al.  On the Length of Programs for Computing Finite Binary Sequences , 1966, JACM.

[77]  Mario Bertero,et al.  Introduction to Inverse Problems in Imaging , 1998 .

[78]  Christian Eitzinger,et al.  Triangular Norms , 2001, Künstliche Intell..

[79]  Bernhard Sendhoff,et al.  Creating Brain-Like Intelligence: From Basic Principles to Complex Intelligent Systems , 2009, Creating Brain-Like Intelligence.

[80]  G. Burton Sobolev Spaces , 2013 .

[81]  Wolfgang Osten,et al.  Introduction to Inverse Problems in Imaging , 1999 .

[82]  Charles A. Micchelli,et al.  Universal Multi-Task Kernels , 2008, J. Mach. Learn. Res..

[83]  Bernhard Sendhoff,et al.  Creating Brain-Like Intelligence , 2009, Creating Brain-Like Intelligence.

[85]  S. Brendle,et al.  Calculus of Variations , 1927, Nature.