Learning with Kernels and Logical Representations

Choosing an appropriate kernel function is a fundamental step for the application of many popular statistical learning algorithms. Kernels are actually the natural entry point for inserting prior knowledge into the learning process. Inductive logic programming (ILP), on the other hand, offers a powerful and flexible framework for describing existing background knowledge and extracting additional knowledge from the data. It therefore seems natural to explore the synergy between these two important paradigms of machine learning. In this extended abstract (see [1] for a longer version), I briefly review some of our recent work about statistical learning with kernel machines in the ILP setting.

[1]  Alexander Gammerman,et al.  Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[2]  Jude W. Shavlik,et al.  Learning Ensembles of First-Order Clauses for Recall-Precision Curves: A Case Study in Biomedical Information Extraction , 2004, ILP.

[3]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[4]  Tatsuya Akutsu,et al.  Extensions of marginalized graph kernels , 2004, ICML.

[5]  Luc De Raedt,et al.  Kernels on Prolog Proof Trees: Statistical Learning in the ILP Setting , 2006, J. Mach. Learn. Res..

[6]  Michael Collins,et al.  New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron , 2002, ACL.

[7]  J. W. Lloyd Logic and Learning , 2003 .

[8]  Alexander J. Smola,et al.  Fast Kernels for String and Tree Matching , 2002, NIPS.

[9]  Thomas Gärtner,et al.  A survey of kernels for structured data , 2003, SKDD.

[10]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[11]  Thomas Gärtner,et al.  Kernels for structured data , 2008, Series in Machine Perception and Artificial Intelligence.

[12]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[13]  Nelson Goodman,et al.  The calculus of individuals and its uses , 1940, Journal of Symbolic Logic.

[14]  Maurice Bruynooghe,et al.  A polynomial time computable metric between point sets , 2001, Acta Informatica.

[15]  Ben Taskar,et al.  Discriminative Probabilistic Models for Relational Data , 2002, UAI.

[16]  Luc De Raedt,et al.  Logical and Relational Learning: From ILP to MRDM (Cognitive Technologies) , 2008 .

[17]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[18]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[19]  Alex M. Andrew,et al.  Logic for Learning: Learning Comprehensible Theories from Structured Data , 2004 .

[20]  Roberto Casati,et al.  Parts and Places: The Structures of Spatial Representation , 1999 .

[21]  Thomas Gärtner,et al.  Multi-Instance Kernels , 2002, ICML.

[22]  Jan Ramon Thesis: clustering and instance based learning in first order logic , 2002 .

[23]  Stephen Muggleton,et al.  Support Vector Inductive Logic Programming , 2005, Discovery Science.

[24]  Luc De Raedt,et al.  Kernels and Distances for Structured Data , 2008 .

[25]  Mehryar Mohri,et al.  Rational Kernels: Theory and Algorithms , 2004, J. Mach. Learn. Res..

[26]  Thomas Gärtner,et al.  Cyclic pattern kernels for predictive graph mining , 2004, KDD.

[27]  Uday S. Reddy,et al.  Typed Prolog: A Semantic Reconstruction of the Mycroft-O'Keefe Type System , 1991, ISLP.

[28]  Stephen Muggleton,et al.  The Effect of Relational Background Knowledge on Learning of Protein Three-Dimensional Fold Signatures , 2001, Machine Learning.

[29]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[30]  M. Kirsten,et al.  Distance based approaches to relational learning and clustering , 2001 .

[31]  Stefan Wrobel,et al.  Relational Instance-Based Learning with Lists and Terms , 2001, Machine Learning.

[32]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[33]  V. Rich Personal communication , 1989, Nature.

[34]  Bernhard Schölkopf,et al.  Learning Theory and Kernel Machines , 2003, Lecture Notes in Computer Science.

[35]  T. Poggio,et al.  The Mathematics of Learning: Dealing with Data , 2005, 2005 International Conference on Neural Networks and Brain.

[36]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[37]  Yi Lin,et al.  Support Vector Machines and the Bayes Rule in Classification , 2002, Data Mining and Knowledge Discovery.

[38]  Michael I. Jordan,et al.  Large Margin Classifiers: Convex Loss, Low Noise, and Convergence Rates , 2003, NIPS.

[39]  Dan Roth,et al.  Learning with Feature Description Logics , 2002, ILP.

[40]  Ben Taskar,et al.  Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning) , 2007 .

[41]  Alessio Micheli,et al.  Application of Cascade Correlation Networks for Structures to Chemistry , 2004, Applied Intelligence.

[42]  Charles A. Micchelli,et al.  Learning the Kernel Function via Regularization , 2005, J. Mach. Learn. Res..

[43]  Bernhard Schölkopf,et al.  Some kernels for structured data , 2001 .

[44]  Saso Dzeroski,et al.  Experiments in Predicting Biodegradability , 1999, ILP.

[45]  Luc De Raedt,et al.  kFOIL: Learning Simple Relational Kernels , 2006, AAAI.

[46]  Paolo Frasconi,et al.  Weighted decomposition kernels , 2005, ICML.

[47]  J. W. Lloyd,et al.  Logic for Learning , 2003, Cognitive Technologies.

[48]  J. R. Quinlan Learning Logical Definitions from Relations , 1990 .

[49]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT.

[50]  Hisashi Kashima,et al.  Marginalized Kernels Between Labeled Graphs , 2003, ICML.

[51]  Ehud Shapiro,et al.  Algorithmic Program Debugging , 1983 .

[52]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[53]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[54]  Hava T. Siegelmann,et al.  Support Vector Clustering , 2002, J. Mach. Learn. Res..

[55]  Luc De Raedt,et al.  Towards Learning Stochastic Logic Programs from Proof-Banks , 2005, AAAI.

[56]  Stefan Kramer,et al.  Structural Regression Trees , 1996, AAAI/IAAI, Vol. 1.

[57]  Klaus Obermayer,et al.  Support vector learning for ordinal regression , 1999 .

[58]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[59]  Ashwin Srinivasan,et al.  Theories for Mutagenicity: A Study in First-Order and Feature-Based Induction , 1996, Artif. Intell..

[60]  Stephen Muggleton,et al.  Multi-class Prediction Using Stochastic Logic Programs , 2007, ILP.

[62]  Tom M. Mitchell,et al.  Learning by experimentation: acquiring and refining problem-solving heuristics , 1993 .

[63]  Robert P. W. Duin,et al.  Support vector domain description , 1999, Pattern Recognit. Lett..

[64]  Alan W. Biermann,et al.  Constructing Programs from Example Computations , 1976, IEEE Transactions on Software Engineering.

[65]  Gerhard Widmer,et al.  Prediction of Ordinal Classes Using Regression Trees , 2001, Fundam. Informaticae.

[66]  Mark Craven,et al.  Representing Sentence Structure in Hidden Markov Models for Information Extraction , 2001, IJCAI.

[67]  Paolo Frasconi,et al.  Kernels on Prolog Ground Terms , 2005, IJCAI.

[68]  Raymond J. Mooney,et al.  Combining FOIL and EBG to Speed-up Logic Programs , 1993, IJCAI.

[69]  Stanisław Leśniewski Podstawy ogólnej teoryi mnogości. I. : (Część. Ingredyens. Mnogość. Klasa. Element. Podmnogość. Niektóre ciekawe rodzaje klas.) , 1916 .

[70]  Eleazar Eskin,et al.  The Spectrum Kernel: A String Kernel for SVM Protein Classification , 2001, Pacific Symposium on Biocomputing.

[71]  Alexander J. Smola,et al.  Kernels and Regularization on Graphs , 2003, COLT.

[72]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[73]  Jan Ramon,et al.  Clustering and instance based learning in first order logic , 2002, AI Communications.

[74]  G. Wahba,et al.  A Correspondence Between Bayesian Estimation on Stochastic Processes and Smoothing by Splines , 1970 .

[75]  Alexander J. Smola,et al.  Hyperkernels , 2002, NIPS.

[76]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[77]  Dan Roth,et al.  On Kernel Methods for Relational Learning , 2003, ICML.

[78]  J. Ramon,et al.  A Framework for Deening Distances between Rst-order Logic Objects 1 , 1998 .

[79]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[80]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[81]  John R. Anderson,et al.  MACHINE LEARNING An Artificial Intelligence Approach , 2009 .

[82]  Peter A. Flach,et al.  Propositionalization approaches to relational data mining , 2001 .

[83]  Jennifer Neville,et al.  Collective Classification with Relational Dependency Networks , 2003 .

[84]  Michael Collins,et al.  Convolution Kernels for Natural Language , 2001, NIPS.

[85]  Luc De Raedt,et al.  nFOIL: Integrating Naïve Bayes and FOIL , 2005, AAAI.