Discontinuities in pattern inference

This paper deals with the inferrability of classes of E-pattern languages-also referred to as extended or erasing pattern languages-from positive data in Gold's model of identification in the limit. The first main part of the paper shows that the recently presented negative result on terminal-free E-pattern languages over binary alphabets does not hold for other alphabet sizes, so that the full class of these languages is inferrable from positive data if and only if the corresponding terminal alphabet does not consist of exactly two distinct letters. The second main part yields the insight that the positive result on terminal-free E-pattern languages over alphabets with three or four letters cannot be extended to the class of general E-pattern languages. With regard to larger alphabets, the extensibility remains open. The proof methods developed for these main results do not directly discuss the (non-)existence of appropriate learning strategies, but they deal with structural properties of classes of E-pattern languages, and, in particular, with the problem of finding telltales for these languages. It is shown that the inferrability of classes of E-pattern languages is closely connected to some problems on the ambiguity of morphisms so that the technical contributions of the paper largely consist of combinatorial insights into morphisms in word monoids.

[1]  Andrew R. Mitchell,et al.  Learnability of a subclass of extended pattern languages , 1998, COLT' 98.

[2]  Thomas Zeugmann,et al.  Stochastic Finite Learning of the Pattern Languages , 2001, Machine Learning.

[3]  Takeshi Shinohara,et al.  Polynomial Time Inference of Extended Regular Pattern Languages , 1983, RIMS Symposium on Software Science and Engineering.

[4]  Grzegorz Rozenberg,et al.  Handbook of Formal Languages , 1997, Springer Berlin Heidelberg.

[5]  John Case,et al.  The Synthesis of Language Learners , 1999, Inf. Comput..

[6]  Arto Salomaa,et al.  Pattern languages with and without erasing , 1994 .

[7]  Patrick Brézillon,et al.  Lecture Notes in Artificial Intelligence , 1999 .

[8]  Arto Salomaa,et al.  Finite Degrees of Ambiguity in Pattern Languages , 1994, RAIRO Theor. Informatics Appl..

[9]  Setsuo Arikawa,et al.  Pattern Inference , 1995, GOSLER Final Report.

[10]  Carl Smith Review of Systems that Learn (second edition) by Jain, Osherson, Royer, Sharma , 2000, SIGA.

[11]  Gheorghe Paun,et al.  Strongly Prime PCP Words , 1995, Discret. Appl. Math..

[12]  Sandra Zilles,et al.  Formal language identification: query learning vs. Gold-style learning , 2004, Inf. Process. Lett..

[13]  Tom Head,et al.  Fixed languages and the adult languages of ol schemest , 1981 .

[14]  Daniel Reidenbach On the Learnability of E-pattern Languages over Small Alphabets , 2004, COLT.

[15]  Dana Angluin,et al.  Inductive Inference of Formal Languages from Positive Data , 1980, Inf. Control..

[16]  Gilberto Filé The Relation of Two Patterns with Comparable Languages , 1988, STACS.

[17]  Rolf Wiehagen,et al.  Polynomial-time inference of arbitrary pattern languages , 2009, New Generation Computing.

[18]  Daniel Reidenbach,et al.  A non-learnable class of E-pattern languages , 2006, Theor. Comput. Sci..

[19]  Enno Ohlebusch,et al.  On the Equivalence Problem for E-Pattern Languages , 1997, Theor. Comput. Sci..

[20]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[21]  M. Lothaire Combinatorics on words: Bibliography , 1997 .

[22]  Keith Wright Identification of unions of languages drawn from an identifiable class , 1989, COLT '89.

[23]  Dominik D. Freydenberger,et al.  Unambiguous Morphic Images of Strings , 2005, Developments in Language Theory.

[24]  Daniel Reidenbach,et al.  A Discontinuity in Pattern Inference , 2004, STACS.

[25]  Tao Jiang,et al.  Decision Problems for Patterns , 1995, J. Comput. Syst. Sci..

[26]  Andrzej Ehrenfeucht,et al.  Finding a Homomorphism Between Two Words is NP-Complete , 1979, Inf. Process. Lett..

[27]  Sandra Zilles,et al.  Relations between Gold-style learning and query learning , 2005, Inf. Comput..

[28]  Rolf Wiehagen,et al.  Ignoring data may be the only way to learn efficiently , 1994, J. Exp. Theor. Artif. Intell..

[29]  Daniel Reidenbach,et al.  On the Equivalence Problem for E-pattern Languages over Small Alphabets , 2004, Developments in Language Theory.

[30]  Christian Choffrut,et al.  Combinatorics of Words , 1997, Handbook of Formal Languages.

[31]  Thomas Zeugmann,et al.  A Guided Tour Across the Boundaries of Learning Recursive Languages , 1995, GOSLER Final Report.

[32]  Daniel Reidenbach An Examination of Ohlebusch and Ukkonen's Conjecture on the Equivalence Problem for E-Pattern Languages , 2007, J. Autom. Lang. Comb..

[33]  Dana Angluin,et al.  Finding Patterns Common to a Set of Strings , 1980, J. Comput. Syst. Sci..

[34]  Rüdiger Reischuk,et al.  An Average-Case Optimal One-Variable Pattern Language Learner , 2000, J. Comput. Syst. Sci..

[35]  Jr. Hartley Rogers Theory of Recursive Functions and Effective Computability , 1969 .