Generalisation Operators for Lists Embedded in a Metric Space

In some application areas, similarities and distances are used to calculate how similar two objects are in order to use these measurements to find related objects, to cluster a set of objects, to make classifications or to perform an approximate search guided by the distance. In many other application areas, we require patterns to describe similarities in the data. These patterns are usually constructed through generalisation (or specialisation) operators. For every data structure, we can define distances. In fact, we may find different distances for sets, lists, atoms, numbers, ontologies, web pages, etc. We can also define pattern languages and use generalisation operators over them. However, for many data structures, distances and generalisation operators are not consistent. For instance, for lists (or sequences), edit distances are not consistent with regular languages, since, for a regular pattern such as *a, the covered set of lists might be far away in terms of the edit distance (e.g. bbbbbba and aa). In this paper we investigate the way in which, given a pattern language, we can define a pair of generalisation operator and distance which are consistent. We define the notion of (minimal) distance-based generalisation operators for lists. We illustrate positive results with two different pattern languages.

[1]  John W. Lloyd,et al.  Classification of Individuals with Complex Structure , 2000, ICML.

[2]  José Hernández-Orallo,et al.  Incremental Learning of Functional Logic Programs , 2001, FLOPS.

[3]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .

[4]  Jorma Rissanen,et al.  Hypothesis Selection and Testing by the MDL Principle , 1999, Comput. J..

[5]  José Hernández-Orallo,et al.  Distance-Based Generalisation Operators for Graphs , 2006 .

[6]  David L. Dowe,et al.  Minimum Message Length and Kolmogorov Complexity , 1999, Comput. J..

[7]  Roland Olsson,et al.  Inductive Functional Programming Using Incremental Program Transformation , 1995, Artif. Intell..

[8]  Stephen Muggleton,et al.  Inductive Logic Programming: Issues, Results and the Challenge of Learning Language in Logic , 1999, Artif. Intell..

[9]  Ute Schmid Inductive Synthesis of Functional Programs , 2003, Lecture Notes in Computer Science.

[10]  G. A. Edgar Measure, Topology, and Fractal Geometry , 1990 .

[11]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[12]  José Hernández-Orallo,et al.  Inverse Narrowing for the Induction of Functional Logic Programs , 1998, APPIA-GULP-PRODE.

[13]  José Hernández-Orallo,et al.  Distance Based Generalisation , 2005, ILP.

[14]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[15]  José Hernández-Orallo,et al.  Minimal Distance-Based Generalisation Operators for First-Order Objects , 2006, ILP.

[16]  Pierre Baldi,et al.  Kernels for small molecules and the prediction of mutagenicity, toxicity and anti-cancer activity , 2005, ISMB.

[17]  Vicente Estruch Gregori Bridging the gap between distance and generalisation: symbolic learning in metric spaces , 2008 .

[18]  Gunnar Rätsch,et al.  Advanced Lectures on Machine Learning , 2004, Lecture Notes in Computer Science.

[19]  José Hernández-Orallo,et al.  A Strong Complete Schmema for Inductive Functional Logic Programming , 1999, ILP.

[20]  John W. Lloyd,et al.  Learning Comprehensible Theories from Structured Data , 2002, Machine Learning Summer School.