An Empirical Evaluation of Rule Extraction from Recurrent Neural Networks

Rule extraction from black box models is critical in domains that require model validation before implementation, as can be the case in credit scoring and medical diagnosis. Though already a challenging problem in statistical learning in general, the difficulty is even greater when highly nonlinear, recursive models, such as recurrent neural networks (RNNs), are fit to data. Here, we study the extraction of rules from second-order RNNs trained to recognize the Tomita grammars. We show that production rules can be stably extracted from trained RNNs and that in certain cases, the rules outperform the trained RNNs.

[1]  Patrice Simardy,et al.  Learning Long-Term Dependencies with , 2007 .

[2]  Kan Li,et al.  The Kernel Adaptive Autoregressive-Moving-Average Algorithm , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[3]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[4]  M. W. Shields An Introduction to Automata Theory , 1988 .

[5]  John F. Kolen,et al.  Fool's Gold: Extracting Finite State Machines from Recurrent Network Dynamics , 1993, NIPS.

[6]  Ruslan Salakhutdinov,et al.  Linguistic Knowledge as Memory for Recurrent Neural Networks , 2017, ArXiv.

[7]  Xue Liu,et al.  A Comparison of Rule Extraction for Different Recurrent Neural Network Models and Grammatical Complexity , 2018, ArXiv.

[8]  Ah Chung Tsoi,et al.  Noisy Time Series Prediction using Recurrent Neural Networks and Grammatical Inference , 2001, Machine Learning.

[9]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[10]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[11]  C. L. Giles,et al.  Second-order recurrent neural networks for grammatical inference , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[12]  Ursula Dresdner,et al.  Computation Finite And Infinite Machines , 2016 .

[13]  Darrell D. E. Long,et al.  Theory of finite automata with an introduction to formal languages , 1989 .

[14]  Soummya Kar,et al.  Topology adaptive graph convolutional networks , 2017, ArXiv.

[15]  Jason Weston,et al.  Memory Networks , 2014, ICLR.

[16]  Raymond L. Watrous,et al.  Induction of Finite-State Automata Using Second-Order Recurrent Networks , 1991, NIPS.

[17]  Henrik Jacobsson,et al.  Rule Extraction from Recurrent Neural Networks: ATaxonomy and Review , 2005, Neural Computation.

[18]  Arthur Szlam,et al.  Automatic Rule Extraction from Long Short Term Memory Networks , 2016, ICLR.

[19]  Colin de la Higuera,et al.  Learning Finite State Machines , 2009, FSMNLP.

[20]  Giovanni Soda,et al.  Inductive inference from noisy examples using the hybrid finite state filter , 1998, IEEE Trans. Neural Networks.

[21]  J. Pollack The Induction of Dynamical Recognizers , 1996, Machine Learning.

[22]  Soummya Kar,et al.  Convergence Analysis of Distributed Inference with Vector-Valued Gaussian Belief Propagation , 2016, J. Mach. Learn. Res..

[23]  C. Lee Giles,et al.  Extraction of rules from discrete-time recurrent neural networks , 1996, Neural Networks.

[24]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[25]  Padhraic Smyth,et al.  Learning Finite State Machines With Self-Clustering Recurrent Networks , 1993, Neural Computation.

[26]  P. Frasconi,et al.  Representation of Finite State Automata in Recurrent Radial Basis Function Networks , 1996, Machine Learning.

[27]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[28]  Artur S. d'Avila Garcez,et al.  Learning and Representing Temporal Knowledge in Recurrent Networks , 2011, IEEE Transactions on Neural Networks.

[29]  Alberto Sanfeliu,et al.  Active Grammatical Inference: A New Learning Methodology , 1994 .

[30]  C. Lee Giles,et al.  Learning and Extracting Finite State Automata with Second-Order Recurrent Neural Networks , 1992, Neural Computation.

[31]  Mike Casey,et al.  The Dynamics of Discrete-Time Computation, with Application to Recurrent Neural Networks and Finite State Machine Extraction , 1996, Neural Computation.

[32]  C. Lee Giles,et al.  Higher Order Recurrent Networks and Grammatical Inference , 1989, NIPS.

[33]  Peter Tiño,et al.  Learning long-term dependencies in NARX recurrent neural networks , 1996, IEEE Trans. Neural Networks.

[34]  Joachim Diederich,et al.  Knowledge Extraction and Recurrent Neural Networks: An Analysis of an Elman Network trained on a Natural Language Learning Task , 1998, CoNLL.

[35]  C. Lee Giles,et al.  Constructing deterministic finite-state automata in recurrent neural networks , 1996, JACM.

[36]  Alaa A. Kharbouch,et al.  Three models for the description of language , 1956, IRE Trans. Inf. Theory.

[37]  C. Lee Giles,et al.  Symbolic Knowledge Representation in Recurrent Neural Networks: Insights from Theoretical Models of , 2000 .

[38]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[39]  Learning and Extracting Initial Mealy Automata with a Modular Neural Network Model , 1995, Neural Computation.