Similarity Functions for Structured Data. An Application to Decision Trees

Learning from structured data is becoming increasingly important. Besides the well-known approaches which deal directly with complex data representations (inductive logic programming and multi-relational data mining), new techniques have been recently proposed by upgrading propositional learning algorithms. Focusing on distance-based methods, these techniques are extended by incorporating similarity functions defined over structured domains, for instance a k-NN algorithm solving a graph classification problem. Since a measure between objects is the essential component for this kind of methods, this paper starts with a description of some of the recent similarity functions defined over common structured data (lists, sets, terms, etc.). However, many of the most common classification techniques, such as decision tree learning, are not distance-based methods or cannot be directly adapted to be so (as kernel methods and neural networks have been adapted). In this work, we extend decision trees to use any kind of similarity function. The method is inspired by centre splitting, which constructs decision trees by defining splits based on the distance to two or more centroids. We include an experimental analysis with both propositional data and complex data. Apart from the advantages of the new proposed method, it can be used as an example of how other partition-based methods can be adapted to deal with distances and, hence, with structured data.

[1]  Gordon Plotkin,et al.  A Note on Inductive Generalization , 2008 .

[2]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[3]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[4]  Dietrich Wettschereck,et al.  Relational Instance-Based Learning , 1996, ICML.

[5]  Wim Van Laer,et al.  Distance measures between atoms , 1998 .

[6]  John D. Lafferty,et al.  Diffusion Kernels on Graphs and Other Discrete Input Spaces , 2002, ICML.

[7]  Jan Ramon,et al.  Expressivity versus efficiency of graph kernels , 2003 .

[8]  Maurice Bruynooghe,et al.  A polynomial time computable metric between point sets , 2001, Acta Informatica.

[9]  金田 重郎,et al.  C4.5: Programs for Machine Learning (書評) , 1995 .

[10]  Miquel Barceló,et al.  Inteligencia Artificial , 2001 .

[11]  Jan Ramon,et al.  Multi instance neural networks , 2000, ICML 2000.

[12]  Thomas Gärtner,et al.  Multi-Instance Kernels , 2002, ICML.

[13]  Randall R. Holmes,et al.  Introduction to Topology , 2008 .

[14]  Alan Hutchinson,et al.  Metrics on Terms and Clauses , 1997, ECML.

[15]  Thomas Gärtner,et al.  Kernels and Distances for Structured Data , 2004, Machine Learning.

[16]  Shan-Hwei Nienhuys-Cheng,et al.  Distance Between Herbrand Interpretations: A Measure for Approximations to a Target Concept , 1997, ILP.

[17]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[18]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[19]  Hendrik Blockeel,et al.  Top-Down Induction of First Order Logical Decision Trees , 1998, AI Commun..

[20]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[21]  H. Kashima,et al.  Kernels for graphs , 2004 .

[22]  Ashwin Srinivasan,et al.  Mutagenesis: ILP experiments in a non-determinate biological domain , 1994 .

[23]  C. Thornton Truth from Trash: How Learning Makes Sense , 2000 .

[24]  Luc De Raedt,et al.  Top-Down Induction of Clustering Trees , 1998, ICML.

[25]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[26]  Heikki Mannila,et al.  Distance measures for point sets and their computation , 1997, Acta Informatica.

[27]  Kuo-Chung Tai,et al.  The Tree-to-Tree Correction Problem , 1979, JACM.

[28]  Max Bramer Logic Programming with Prolog , 2005, Springer London.

[29]  Thomas Hofmann,et al.  Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[30]  J. Lloyd Foundations of Logic Programming , 1984, Symbolic Computation.

[31]  Thomas Gärtner,et al.  Kernels for structured data , 2008, Series in Machine Perception and Artificial Intelligence.

[32]  James D. Keeler,et al.  Integrated Segmentation and Recognition of Hand-Printed Numerals , 1990, NIPS.