Learning of regular expressions by pattern matching

We consider the problem of restoring regular expressions from good examples. We describe a natural learning algorithm for obtaining a “plausible” regular expression from one example. The algorithm is based on finding the longest substring which can be matched by some part of the so far obtained expression. We believe that the algorithm to a certain extent mimics humans guessing regular expressions from the same sort of examples. We show that for regular expressions of bounded length successful learning takes time linear in the length of the example, provided that the example is “good”. Under certain natural restrictions the run-time of the learning algorithm is polynomial also in unsuccessful cases. In the end we discuss the computer experiment of learning regular expressions via the described algorithm, showing that the proposed learning method is quite practical.

[1]  Noriyuki Tanida,et al.  Polynomial-Time Identification of Strictly Regular Languages in the Limit , 1992 .

[2]  Rolf Wiehagen From Inductive Inference to Algorithmic Learning Theory , 1992, ALT.

[3]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[4]  Rusins Freivalds,et al.  Inductive Inference from Good Examples , 1989, AII.

[5]  Guntis Barzdins,et al.  Towards Efficient Inductive Synthesis of Expressions from Input/Output Examples , 1993, ALT.

[6]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[7]  Dana Angluin,et al.  A Note on the Number of Queries Needed to Identify Regular Languages , 1981, Inf. Control..

[8]  Leonard Pitt,et al.  Inductive Inference, DFAs, and Computational Complexity , 1989, AII.

[9]  Leslie G. Valiant,et al.  Cryptographic Limitations on Learning Boolean Formulae and Finite Automata , 1993, Machine Learning: From Theory to Applications.

[10]  Karlis Cerans,et al.  Efficient Learning of Regular Expressions from Good Examples , 1994, AII/ALT.

[11]  A. Brazma Efficient identification of regular expressions from representative examples , 1993, COLT '93.

[12]  Alvis Brazma,et al.  Efficient Algorithm for Learning Simple Regular Expressions from Noisy Examples , 1994, AII/ALT.

[13]  Dana Angluin,et al.  Learning Regular Sets from Queries and Counterexamples , 1987, Inf. Comput..

[14]  Alvis Brazma,et al.  Learning a Subclass of Regular Expressions by Recognizing Periodic Repetitions , 1993, Scandinavian Conference on AI.

[15]  Stephen Muggleton,et al.  Inductive acquisition of expert knowledge , 1986 .

[16]  Efim B. Kinber,et al.  Learning A Class of Regular Expressions via Restricted Subset Queries , 1992, AII.