Term Comparisons in First-Order Similarity Measures

The similarity measures used in first-order IBL so far have been limited to the function-free case. In this paper we show that a lot of predictive power can be gained by allowing lists and other terms in the input representation and designing similarity measures that work directly on these structures. We present an improved similarity measure for the first-order instance based learner Ribl that employs the concept of edit distances to efficiently compute distances between lists and terms, discuss its computational and formal properties, and show that it is empirically superior by a wide margin on a problem from the domain of biochemistry.

[1]  Alan Hutchinson,et al.  Metrics on Terms and Clauses , 1997, ECML.

[2]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[3]  David W. Aha,et al.  Weighting Features , 1995, ICCBR.

[4]  Kuo-Chung Tai,et al.  The Tree-to-Tree Correction Problem , 1979, JACM.

[5]  Shan-Hwei Nienhuys-Cheng,et al.  Distance Between Herbrand Interpretations: A Measure for Approximations to a Target Concept , 1997, ILP.

[6]  R. Klausner,et al.  Regulating the fate of mRNA: The control of cellular iron metabolism , 1993, Cell.

[7]  Dietrich Wettschereck,et al.  Relational Instance-Based Learning , 1996, ICML.

[8]  Kaizhong Zhang,et al.  Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems , 1989, SIAM J. Comput..

[9]  Esko Ukkonen,et al.  Algorithms for Approximate String Matching , 1985, Inf. Control..

[10]  Michael Zuker,et al.  Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information , 1981, Nucleic Acids Res..

[11]  Kaizhong Zhang,et al.  Comparing multiple RNA secondary structures using tree comparisons , 1990, Comput. Appl. Biosci..

[12]  J. McCarthy,et al.  Cytoplasmic mRNA-protein interactions in eukaryotic gene expression. , 1995, Trends in biochemical sciences.

[13]  M. Berry,et al.  Knowing when not to stop: selenocysteine incorporation in eukaryotes. , 1996, Trends in biochemical sciences.

[14]  Saso Dzeroski,et al.  Applying ILP to Diterpene Structure Elucidation from 13C NMR Spectra , 1996, Inductive Logic Programming Workshop.

[15]  Otthein Herzog,et al.  An Approach to mRNA Signalstructure Detection through Knowledge Discovery , 1997, German Conference on Bioinformatics.

[16]  Alfred V. Aho,et al.  Algorithms for Finding Patterns in Strings , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[17]  Michèle Sebag,et al.  Distance Induction in First Order Logic , 1997, ILP.

[18]  Gilles Bisson Conceptual Clustering in a First Order Logic Representation , 1992, ECAI.