A Comparative Study of Rule Extraction for Recurrent Neural Networks

Understanding recurrent networks through rule extraction has a long history. This has taken on new interests due to the need for interpreting or verifying neural networks. One basic form for representing stateful rules is deterministic finite automata (DFA). Previous research shows that extracting DFAs from trained second-order recurrent networks is not only possible but also relatively stable. Recently, several new types of recurrent networks with more complicated architectures have been introduced. These handle challenging learning tasks usually involving sequential data. However, it remains an open problem whether DFAs can be adequately extracted from these models. Specifically, it is not clear how DFA extraction will be affected when applied to different recurrent networks trained on data sets with different levels of complexity. Here, we investigate DFA extraction on several widely adopted recurrent networks that are trained to learn a set of seven regular Tomita grammars. We first formally analyze the complexity of Tomita grammars and categorize these grammars according to that complexity. Then we empirically evaluate different recurrent networks for their performance of DFA extraction on all Tomita grammars. Our experiments show that for most recurrent networks, their extraction performance decreases as the complexity of the underlying grammar increases. On grammars of lower complexity, most recurrent networks obtain desirable extraction performance. As for grammars with the highest level of complexity, while several complicated models fail with only certain recurrent networks having satisfactory extraction performance.

[1]  Bin Yu,et al.  Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs , 2018, ICLR.

[2]  Soummya Kar,et al.  Convergence Analysis of Distributed Inference with Vector-Valued Gaussian Belief Propagation , 2016, J. Mach. Learn. Res..

[3]  Eran Yahav,et al.  Extracting Automata from Recurrent Neural Networks Using Queries and Counterexamples , 2017, ICML.

[4]  Raymond L. Watrous,et al.  Induction of Finite-State Languages Using Second-Order Recurrent Networks , 1992, Neural Computation.

[5]  Jonathan Berant,et al.  Inducing Regular Grammars Using Recurrent Neural Networks , 2017, ArXiv.

[6]  C. Lee Giles,et al.  An Empirical Evaluation of Recurrent Neural Network Rule Extraction , 2017, ArXiv.

[7]  Dana Angluin,et al.  Learning Regular Sets from Queries and Counterexamples , 1987, Inf. Comput..

[8]  Henrik Jacobsson,et al.  Rule Extraction from Recurrent Neural Networks: ATaxonomy and Review , 2005, Neural Computation.

[9]  David Reitter,et al.  Learning Simpler Language Models with the Differential State Framework , 2017, Neural Computation.

[10]  Joachim Diederich,et al.  Knowledge Extraction and Recurrent Neural Networks: An Analysis of an Elman Network trained on a Natural Language Learning Task , 1998, CoNLL.

[11]  Arthur Szlam,et al.  Automatic Rule Extraction from Long Short Term Memory Networks , 2016, ICLR.

[12]  Colin de la Higuera,et al.  Grammatical Inference: Learning Automata and Grammars , 2010 .

[13]  Yoshua Bengio,et al.  An EM approach to grammatical inference: input/output HMMs , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[14]  C. Lee Giles,et al.  Symbolic Knowledge Representation in Recurrent Neural Networks: Insights from Theoretical Models of , 2000 .

[15]  Xue Liu,et al.  An Empirical Evaluation of Rule Extraction from Recurrent Neural Networks , 2017, Neural Computation.

[16]  Artur S. d'Avila Garcez,et al.  Learning and Representing Temporal Knowledge in Recurrent Networks , 2011, IEEE Transactions on Neural Networks.

[17]  Alberto Sanfeliu,et al.  Active Grammatical Inference: A New Learning Methodology , 1994 .

[18]  Walter Daelemans Colin de la Higuera: Grammatical inference: learning automata and grammars , 2011, Machine Translation.

[19]  C. Lee Giles,et al.  Learning and Extracting Finite State Automata with Second-Order Recurrent Neural Networks , 1992, Neural Computation.

[20]  C. Lee Giles,et al.  Higher Order Recurrent Networks and Grammatical Inference , 1989, NIPS.

[21]  Rafael C. Carrasco Accurate Computation of the Relative Entropy Between Stochastic Regular Grammars , 1997, RAIRO Theor. Informatics Appl..

[22]  C. Lee Giles,et al.  Extraction of rules from discrete-time recurrent neural networks , 1996, Neural Networks.

[23]  John F. Kolen,et al.  Fool's Gold: Extracting Finite State Machines from Recurrent Network Dynamics , 1993, NIPS.

[24]  Ruslan Salakhutdinov,et al.  Linguistic Knowledge as Memory for Recurrent Neural Networks , 2017, ArXiv.

[25]  Ying Zhang,et al.  On Multiplicative Integration with Recurrent Neural Networks , 2016, NIPS.

[26]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[27]  Welch Bl THE GENERALIZATION OF ‘STUDENT'S’ PROBLEM WHEN SEVERAL DIFFERENT POPULATION VARLANCES ARE INVOLVED , 1947 .

[28]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[29]  Geoffrey E. Hinton,et al.  Distilling a Neural Network Into a Soft Decision Tree , 2017, CEx@AI*IA.

[30]  Giovanni Soda,et al.  Inductive inference from noisy examples using the hybrid finite state filter , 1998, IEEE Trans. Neural Networks.

[31]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[32]  Douglas Lind,et al.  An Introduction to Symbolic Dynamics and Coding , 1995 .

[33]  Kan Li,et al.  The Kernel Adaptive Autoregressive-Moving-Average Algorithm , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[34]  Alaa A. Kharbouch,et al.  Three models for the description of language , 1956, IRE Trans. Inf. Theory.

[35]  Soummya Kar,et al.  Topology adaptive graph convolutional networks , 2017, ArXiv.

[36]  Marvin Minsky,et al.  Computation : finite and infinite machines , 2016 .

[37]  Jordan B. Pollack,et al.  The induction of dynamical recognizers , 1991, Machine Learning.

[38]  Colin de la Higuera,et al.  Probabilistic DFA Inference using Kullback-Leibler Divergence and Minimality , 2000, ICML.

[39]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[40]  C. L. Giles,et al.  Second-order recurrent neural networks for grammatical inference , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[41]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[42]  Learning and Extracting Initial Mealy Automata with a Modular Neural Network Model , 1995, Neural Computation.

[43]  P. Frasconi,et al.  Representation of Finite State Automata in Recurrent Radial Basis Function Networks , 1996, Machine Learning.

[44]  Chandan Singh,et al.  Hierarchical interpretations for neural network predictions , 2018, ICLR.

[45]  Mike Casey,et al.  The Dynamics of Discrete-Time Computation, with Application to Recurrent Neural Networks and Finite State Machine Extraction , 1996, Neural Computation.

[46]  Padhraic Smyth,et al.  Learning Finite State Machines With Self-Clustering Recurrent Networks , 1993, Neural Computation.

[47]  Peter Tiño,et al.  Learning long-term dependencies in NARX recurrent neural networks , 1996, IEEE Trans. Neural Networks.