Learning a Subclass of Regular Patterns in

An algorithm for learning a subclass of erasing regular pattern languages is presented. On extended regular pattern languages generated by patterns of the form x0 1x1 : : : mxm , where x0; : : : ; xm are variables and 1; : : : ; m strings of terminals of length c each, it runs with arbitrarily high probability of success using a number of examples polynomial in m (and exponential in c ). It is assumed that m is unknown, but c is known and that samples are randomly drawn according to some distribution, for which we only require that it has certain natural and plausible properties. Aiming to improve this algorithm further we also explore computer simulations of a heuristic.

[1]  R. Smullyan Theory of formal systems , 1962 .

[2]  Dana Angluin,et al.  Finding Patterns Common to a Set of Strings , 1980, J. Comput. Syst. Sci..

[3]  Takeshi Shinohara,et al.  Polynomial Time Inference of Extended Regular Pattern Languages , 1983, RIMS Symposium on Software Science and Engineering.

[4]  T. Shinohara INFERRING UNIONS OF TWO PATTERN LANGUAGES , 1983 .

[5]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[6]  Robert Nix,et al.  Editing by example , 1985, POPL '84.

[7]  Leonard Pitt,et al.  A polynomial-time algorithm for learning k-variable pattern languages from examples , 1989, COLT '89.

[8]  Robert E. Schapire,et al.  Pattern languages are not learnable , 1990, Annual Conference Computational Learning Theory.

[9]  Akihiro Yamamoto,et al.  Learning Elementary Formal Systems , 1992, Theor. Comput. Sci..

[10]  Ayumi Shinohara,et al.  Knowledge Acquisition from Amino Acid Sequences by Machine Learning System BONSAI , 1992 .

[11]  Luc De Raedt,et al.  Inductive Logic Programming: Theory and Methods , 1994, J. Log. Program..

[12]  Ivan Bratko,et al.  Applications of inductive logic programming , 1995, SGAR.

[13]  Arto Salomaa,et al.  Return to Patterns , 1995, Bull. EATCS.

[14]  Setsuo Arikawa,et al.  Pattern Inference , 1995, GOSLER Final Report.

[15]  Heikki Mannila,et al.  MDL learning of unions of simple pattern languages from positive examples , 1995, EuroCOLT.

[16]  Satoshi Matsumoto,et al.  Learnability of Subsequence Languages , 1996 .

[17]  Esko Ukkonen,et al.  Discovering Unbounded Unions of Regular Pattern Languages from Positive Examples (Extended Abstract) , 1996, ISAAC.

[18]  Rüdiger Reischuk,et al.  An Average-Case Optimal One-Variable Pattern Language Learner , 2000, J. Comput. Syst. Sci..

[19]  Rüdiger Reischuk,et al.  Learning one-variable pattern languages in linear average time , 1997, COLT' 98.

[20]  John Case,et al.  Incremental Concept Learning for Bounded Data Mining , 1997, Inf. Comput..

[21]  Hiroki Arimura,et al.  Inductive inference of unbounded unions of pattern languages from positive data , 2000, Theor. Comput. Sci..

[22]  John Case,et al.  Predictive learning models for concept drift , 2001, Theor. Comput. Sci..

[23]  Stephen Kwek,et al.  On learning unions of pattern languages and tree patterns in the mistake bound model , 2002, Theor. Comput. Sci..

[24]  Daniel Reidenbach A Negative Result on Inductive Inference of Extended Pattern Languages , 2002, ALT.

[25]  Thomas Zeugmann,et al.  Lange and Wiehagen's pattern language learning algorithm: An average-case analysis with respect to its total learning time , 1995, Annals of Mathematics and Artificial Intelligence.

[26]  Daniel Reidenbach On the Learnability of E-pattern Languages over Small Alphabets , 2004, COLT.

[27]  Hiroki Arimura,et al.  Finding tree patterns consistent with positive and negative examples using queries , 2004, Annals of Mathematics and Artificial Intelligence.

[28]  Paul Cull,et al.  On Exact Learning of Unordered Tree Patterns , 2001, Machine Learning.

[29]  Daniel Reidenbach,et al.  A Discontinuity in Pattern Inference , 2004, STACS.

[30]  Rolf Wiehagen,et al.  Polynomial-time inference of arbitrary pattern languages , 2009, New Generation Computing.

[31]  Ayumi Shinohara,et al.  Polynomial-time learning of elementary formal systems , 2000, New Generation Computing.