Learning regular languages using RFSAs

Residual languages are important and natural components of regular languages and several grammatical inference algorithms naturally rely on this notion. In order to identify a given target language L, classical inference algorithms try to identify words which define identical residual languages of L. Here, we study whether it could be interesting to perform a tighter analysis by identifying inclusion relations between the residual languages of L. We consider the class of Residual Finite State Automata (RFSAs). An RFSA A is a NonDeterministic Automaton whose states corresponds to residual languages of the language LA it recognizes. The inclusion relations between residual languages of LA can be naturally materialized on A. We prove that the class of RFSAs is not polynomially characterizable. We lead some experiments which show that when a regular language is randomly drawn by using a nondeterministic representation, the number of inclusion relations between its residual languages is very important. Moreover, its minimal RFSA representation is much smaller than its minimal DFA representation. Finally, we design a new learning algorithm, DeLeTe2, based on the search for the inclusion relations between the residual languages of the target language. We give sufficient conditions for the identifiability of the target language. We experimentally compare the performance of DeLeTe2 to those of classical inference algorithms.

[1]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[2]  Janusz A. Brzozowski,et al.  Derivatives of Regular Expressions , 1964, JACM.

[3]  Aurélien Lemay,et al.  Some Classes of Regular Languages Identifiable in the Limit from Positive Data , 2002, ICGI.

[4]  Colin de la Higuera,et al.  Characteristic Sets for Polynomial Grammatical Inference , 1997, Machine Learning.

[5]  Barak A. Pearlmutter,et al.  Results of the Abbadingo One DFA Learning Competition and a New Evidence-Driven State Merging Algorithm , 1998, ICGI.

[6]  Dana Angluin,et al.  Learning Regular Sets from Queries and Counterexamples , 1987, Inf. Comput..

[7]  Grzegorz Rozenberg,et al.  Handbook of Formal Languages , 1997, Springer Berlin Heidelberg.

[8]  Takashi Yokomori,et al.  Learning non-deterministic finite automata from queries and counterexamples , 1994, Machine Intelligence 13.

[9]  E. Mark Gold,et al.  Complexity of Automaton Identification from Given Data , 1978, Inf. Control..

[10]  Daniel Fredouille,et al.  Efficient Ambiguity Detection in C-NFA, a Step Towards the Inference on Non Deterministic Automata , 2000, ICGI.

[11]  J. van Leeuwen,et al.  Theoretical Computer Science , 2003, Lecture Notes in Computer Science.

[12]  Aurélien Lemay,et al.  Learning Regular Languages Using Non Deterministic Finite Automata , 2000, ICGI.

[13]  Kevin J. Lang Random DFA's can be approximately learned from sparse uniform examples , 1992, COLT '92.

[14]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[15]  Cyril Nicaud,et al.  Etude du comportement en moyenne des automates finis et des langages rationnels , 2000 .

[16]  Aurélien Lemay,et al.  Residual Finite State Automata , 2002, Fundam. Informaticae.

[17]  Colin de la Higuera Characteristic sets for polynominal grammatical inference , 1996, ICGI.

[18]  J. Oncina,et al.  INFERRING REGULAR LANGUAGES IN POLYNOMIAL UPDATED TIME , 1992 .