Characteristic Sets for Inferring the Unions of the Tree Pattern Languages by the Most Fitting Hypotheses

A tree pattern p is a first-order term in formal logic, and the language of p is the set of all the tree patterns obtainable by replacing each variable in p with a tree pattern containing no variables. We consider the inductive inference of the unions of these languages from positive examples using strategies that guarantee some forms of minimality during the learning process. By a result in our earlier work, the existence of a characteristic set for each language in a class ${\mathcal L }$ (within ${\mathcal L }$) implies that ${\mathcal L }$ can be identified in the limit by a learner that simply conjectures a hypothesis containing the examples, that is minimal in the number of elements of up to an appropriate size. Furthermore, if there is a size l such that each candidate hypothesis has a characteristic set (within the languages in ${\mathcal L }$ that intersects non-emptily with the examples) that consists only of elements of up to size l, then the hypotheses containing the least number of elements of up to size l are at the same time minimal with respect to inclusion. In this paper we show how to determine such a size l for the unions of the tree pattern languages, and hence allowing us to learn the class using hypotheses that fulfill the two mentioned notions of minimality.

[1]  Hiroki Arimura,et al.  A Polynomial Time Algorithm for Finding Finite Unions of Tree Pattern Languages , 1991, Nonmonotonic and Inductive Logic.

[2]  Dana Angluin,et al.  Inductive Inference of Formal Languages from Positive Data , 1980, Inf. Control..

[3]  Gordon Plotkin,et al.  A Note on Inductive Generalization , 2008 .

[4]  Hiroki Arimura,et al.  Finding Minimal Generalizations for Unions of Pattern Languages and Its Application to Inductive Inference from Positive Data , 1994, STACS.

[5]  Hiroki Arimura,et al.  Polynomial Time Inference of Unions of Two Tree Pattern Languages , 1992 .

[6]  Hiroki Arimura,et al.  A Generalization of the Least General Generalization , 1994, Machine Intelligence 13.

[7]  Keith Wright Identification of unions of languages drawn from an identifiable class , 1989, COLT '89.

[8]  Takeshi Shinohara,et al.  Inferring Unions of the Pattern Languages by the Most Fitting Covers , 2005, ALT.

[9]  Takeshi Shinohara,et al.  Polynomial Time Inference of Extended Regular Pattern Languages , 1983, RIMS Symposium on Software Science and Engineering.

[10]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[11]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[12]  Dana Angluin,et al.  Inference of Reversible Languages , 1982, JACM.

[13]  Peter H. Schmitt,et al.  Nonmonotonic and Inductive Logic , 1990, Lecture Notes in Computer Science.

[14]  Masako Sato,et al.  INDUCTIVE INFERENCE OF FORMAL LANGUAGES , 1995 .

[15]  Hiroki Arimura,et al.  Learning Unions of Tree Patterns Using Queries , 1995, Theor. Comput. Sci..

[16]  Takeshi Shinohara,et al.  Finding Consensus Patterns in Very Scarce Biosequence Samples from Their Minimal Multiple Generalizations , 2006, PAKDD.

[17]  Hirotaka Ono,et al.  Measuring Over-Generalization in the Minimal Multiple Generalizations of Biosequences , 2005, Discovery Science.

[18]  Satoshi Kobayashi,et al.  Identifiability of Subspaces and Homomorphic Images of Zero-Reversible Languages , 1997, ALT.

[19]  Rajeev Rastogi,et al.  RE-tree: an efficient index structure for regular expressions , 2003, The VLDB Journal.