A Comparison of Rule Extraction for Different Recurrent Neural Network Models and Grammatical Complexity

It has been shown that rules can be extracted from highly non-linear, recursive models such as recurrent neural networks (RNNs). The RNN models mostly investigated include both Elman networks and second-order recurrent networks. Recently, new types of RNNs have demonstrated superior power in handling many machine learning tasks, especially when structural data is involved such as language modeling. Here, we empirically evaluate different recurrent models on the task of learning deterministic finite automata (DFA), the seven Tomita grammars. We are interested in the capability of recurrent models with different architectures in learning and expressing regular grammars, which can be the building blocks for many applications dealing with structural data. Our experiments show that a second-order RNN provides the best and stablest performance of extracting DFA over all Tomita grammars and that other RNN models are greatly influenced by different Tomita grammars. To better understand these results, we provide a theoretical analysis of the "complexity" of different grammars, by introducing the entropy and the averaged edit distance of regular grammars defined in this paper. Through our analysis, we categorize all Tomita grammars into different classes, which explains the inconsistency in the performance of extraction observed across all RNN models.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  Learning and Extracting Initial Mealy Automata with a Modular Neural Network Model , 1995, Neural Computation.

[3]  C. Lee Giles,et al.  Symbolic Knowledge Representation in Recurrent Neural Networks: Insights from Theoretical Models of , 2000 .

[4]  Jean BersteI,et al.  Mathematical Foundations of the Theory of Automata Finite Automata and Rational Languages an Introduction , 2005 .

[5]  J. Pollack The Induction of Dynamical Recognizers , 1996, Machine Learning.

[6]  Raymond L. Watrous,et al.  Induction of Finite-State Languages Using Second-Order Recurrent Networks , 1992, Neural Computation.

[7]  Jonathan Berant,et al.  Inducing Regular Grammars Using Recurrent Neural Networks , 2017, ArXiv.

[8]  Yoshua Bengio,et al.  An EM approach to grammatical inference: input/output HMMs , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[9]  Arthur Szlam,et al.  Automatic Rule Extraction from Long Short Term Memory Networks , 2016, ICLR.

[10]  Giovanni Soda,et al.  Inductive inference from noisy examples using the hybrid finite state filter , 1998, IEEE Trans. Neural Networks.

[11]  Colin de la Higuera,et al.  Grammatical Inference: Learning Automata and Grammars , 2010 .

[12]  Henrik Jacobsson,et al.  Rule Extraction from Recurrent Neural Networks: ATaxonomy and Review , 2005, Neural Computation.

[13]  C. L. Giles,et al.  Second-order recurrent neural networks for grammatical inference , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[14]  Colin de la Higuera,et al.  Probabilistic DFA Inference using Kullback-Leibler Divergence and Minimality , 2000, ICML.

[15]  Kan Li,et al.  The Kernel Adaptive Autoregressive-Moving-Average Algorithm , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[16]  Padhraic Smyth,et al.  Learning Finite State Machines With Self-Clustering Recurrent Networks , 1993, Neural Computation.

[17]  Alaa A. Kharbouch,et al.  Three models for the description of language , 1956, IRE Trans. Inf. Theory.

[18]  Mike Casey,et al.  The Dynamics of Discrete-Time Computation, with Application to Recurrent Neural Networks and Finite State Machine Extraction , 1996, Neural Computation.

[19]  Rafael C. Carrasco Accurate Computation of the Relative Entropy Between Stochastic Regular Grammars , 1997, RAIRO Theor. Informatics Appl..

[20]  C. Lee Giles,et al.  Extraction of rules from discrete-time recurrent neural networks , 1996, Neural Networks.

[21]  Soummya Kar,et al.  Convergence Analysis of Distributed Inference with Vector-Valued Gaussian Belief Propagation , 2016, J. Mach. Learn. Res..

[22]  Eran Yahav,et al.  Extracting Automata from Recurrent Neural Networks Using Queries and Counterexamples , 2017, ICML.

[23]  Ying Zhang,et al.  On Multiplicative Integration with Recurrent Neural Networks , 2016, NIPS.

[24]  Alberto Sanfeliu,et al.  Active Grammatical Inference: A New Learning Methodology , 1994 .

[25]  C. Lee Giles,et al.  Learning and Extracting Finite State Automata with Second-Order Recurrent Neural Networks , 1992, Neural Computation.

[26]  Joachim Diederich,et al.  Knowledge Extraction and Recurrent Neural Networks: An Analysis of an Elman Network trained on a Natural Language Learning Task , 1998, CoNLL.

[27]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[28]  Ruslan Salakhutdinov,et al.  Linguistic Knowledge as Memory for Recurrent Neural Networks , 2017, ArXiv.

[29]  P. Frasconi,et al.  Representation of Finite State Automata in Recurrent Radial Basis Function Networks , 1996, Machine Learning.

[30]  David Reitter,et al.  Learning Simpler Language Models with the Differential State Framework , 2017, Neural Computation.

[31]  C. Lee Giles,et al.  An Empirical Evaluation of Recurrent Neural Network Rule Extraction , 2017, ArXiv.

[32]  Soummya Kar,et al.  Topology adaptive graph convolutional networks , 2017, ArXiv.

[33]  Welch Bl THE GENERALIZATION OF ‘STUDENT'S’ PROBLEM WHEN SEVERAL DIFFERENT POPULATION VARLANCES ARE INVOLVED , 1947 .

[34]  Douglas Lind,et al.  An Introduction to Symbolic Dynamics and Coding , 1995 .