Learning commutative deterministic finite state automata in polynomial time

AbstractWe consider the problem of learning the commutative subclass of regular languages in the on-line model of predicting {0,1∼-valued functions from examples and reinforcements due to Littlestone [7,4]. We show that the entire class of commutative deterministic finite state automata (CDFAs) of an arbitrary alphabet sizek is predictable inO(sk) time with the worst case number of mistakes bounded above byO(skk logs), wheres is the number of states in the target DFA. As a corollary, this result implies that the class of CDFAs is also PAC-learnable from random labeled examples in timeO(sk) with sample complexity $$O\left( {\tfrac{1}{ \in }\left( {\log \tfrac{1}{\delta } + s^k k\log s} \right)} \right)$$ , using a different class of representations. The mistake bound of our algorithm is within a polynomial, for a fixed alphabet size, of the lower boundO(s+k) we obtain by calculating the VC-dimension of the class. Our result also implies the predictability of the class of finite sets of commutative DFAs representing the finite unions of the languages accepted by the respective DFAs.

[1]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[2]  Leslie G. Valiant,et al.  Cryptographic Limitations on Learning Boolean Formulae and Finite Automata , 1993, Machine Learning: From Theory to Applications.

[3]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[4]  David Haussler,et al.  Predicting {0,1}-functions on randomly drawn points , 1988, COLT '88.

[5]  D. Angluin Queries and Concept Learning , 1988 .

[6]  Leonard Pitt,et al.  Prediction-Preserving Reducibility , 1990, J. Comput. Syst. Sci..

[7]  David Haussler,et al.  Predicting (0, 1)-functions on randomly drawn points , 1988, [Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.

[8]  Leonard Pitt,et al.  The minimum consistent DFA problem cannot be approximated within any polynomial , 1989, [1989] Proceedings. Structure in Complexity Theory Fourth Annual Conference.

[9]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[10]  Naoki Abe Polynomial learnability of semilinear sets , 1989, COLT '89.

[11]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[12]  Manfred K. Warmuth,et al.  Learning integer lattices , 1990, COLT '90.

[13]  Leonard Pitt,et al.  The minimum consistent DFA problem cannot be approximated within and polynomial , 1989, STOC '89.

[14]  Nick Littlestone,et al.  From on-line to batch learning , 1989, COLT '89.

[15]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).