Extraction of rules from discrete-time recurrent neural networks

The extraction of symbolic knowledge from trained neural networks and the direct encoding of (partial) knowledge into networks prior to training are important issues. They allow the exchange of information between symbolic and connectionist knowledge representations. The focus of this paper is on the quality of the rules that are extracted from recurrent neural networks. Discrete-time recurrent neural networks can be trained to correctly classify strings of a regular language. Rules defining the learned grammar can be extracted from networks in the form of deterministic finite-state automata (DFAs) by applying clustering algorithms in the output space of recurrent state neurons. Our algorithm can extract different finite-state automata that are consistent with a training set from the same network. We compare the generalization performances of these different models and the trained network and we introduce a heuristic that permits us to choose among the consistent DFAs the model which best approximates the learned regular grammar.

[1]  M. Goudreau,et al.  First-order vs. Second-order Single Layer Recurrent Neural Networks , 1994 .

[2]  Jude W. Shavlik,et al.  Constructive Induction in Knowledge-Based Neural Networks , 1991, ML.

[3]  Padhraic Smyth,et al.  Learning Finite State Machines With Self-Clustering Recurrent Networks , 1993, Neural Computation.

[4]  LiMin Fu,et al.  Rule Generation from Neural Networks , 1994, IEEE Trans. Syst. Man Cybern. Syst..

[5]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[6]  Carl H. Smith,et al.  Inductive Inference: Theory and Methods , 1983, CSUR.

[7]  James Muckle Learning in Class , 1990 .

[8]  C. L. Giles,et al.  Inserting rules into recurrent neural networks , 1992, Neural Networks for Signal Processing II Proceedings of the 1992 IEEE Workshop.

[9]  Hava T. Siegelmann,et al.  On the computational power of neural nets , 1992, COLT '92.

[10]  R. J. Nelson,et al.  Introduction to Automata , 1968 .

[11]  Peter Tiño,et al.  Learning and Extracting Initial Mealy Automata with a Modular Neural Network Model , 1995, Neural Comput..

[12]  Yehoshua Y. Zeevi,et al.  Neural networks: theory and applications , 1992 .

[13]  C. Lee Giles,et al.  Effects of Noise on Convergence and Generalization in Recurrent Networks , 1994, NIPS.

[14]  James L. McClelland,et al.  Finite State Automata and Simple Recurrent Networks , 1989, Neural Computation.

[15]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[16]  Padhraic Smyth,et al.  Self-clustering recurrent networks , 1993, IEEE International Conference on Neural Networks.

[17]  Srimat T. Chakradhar,et al.  First-order versus second-order single-layer recurrent neural networks , 1994, IEEE Trans. Neural Networks.

[18]  C. L. Giles,et al.  Constructive learning of recurrent neural networks , 1993, IEEE International Conference on Neural Networks.

[19]  C. Lee Giles,et al.  Learning and Extracting Finite State Automata with Second-Order Recurrent Neural Networks , 1992, Neural Computation.

[20]  C. Lee Giles,et al.  Higher Order Recurrent Networks and Grammatical Inference , 1989, NIPS.

[21]  C. Lee Giles,et al.  Training Second-Order Recurrent Neural Networks using Hints , 1992, ML.

[22]  Michael C. Mozer,et al.  A Unified Gradient-Descent/Clustering Architecture for Finite State Machine Induction , 1993, NIPS.

[23]  E. Mark Gold,et al.  Complexity of Automaton Identification from Given Data , 1978, Inf. Control..

[24]  F. Fallside,et al.  Neural networks for signal processing : proceedings of the 1991 IEEE workshop , 1991 .

[25]  Learning and Extracting Initial Mealy Automata with a Modular Neural Network Model , 1995, Neural Computation.

[26]  Panagiotis Manolios,et al.  First-Order Recurrent Neural Networks and Deterministic Finite State Automata , 1994, Neural Computation.

[27]  James P. Crutchfield,et al.  Computation at the Onset of Chaos , 1991 .

[28]  D. Burr,et al.  What connectionist models learn , 1990 .

[29]  Raymond L. Watrous,et al.  Induction of Finite-State Languages Using Second-Order Recurrent Networks , 1992, Neural Computation.

[30]  C. Lee Giles,et al.  Learning a class of large finite state machines with a recurrent neural network , 1995, Neural Networks.

[31]  C. Lee Giles,et al.  Constructive learning of recurrent neural networks: limitations of recurrent cascade correlation and a simple solution , 1995, IEEE Trans. Neural Networks.

[32]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[33]  C. L. Giles,et al.  Machine learning using higher order correlation networks , 1986 .

[34]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[35]  C. L. Giles,et al.  Pruning recurrent neural networks for improved generalization performance , 1994, Proceedings of IEEE Workshop on Neural Networks for Signal Processing.

[36]  Stephen José Hanson,et al.  What connectionist models learn: Learning and representation in connectionist networks , 1990, Behavioral and Brain Sciences.

[37]  Giovanni Soda,et al.  An unified approach for integrating explicit knowledge and learning by example in recurrent networks , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[38]  Hava T. Siegelmann,et al.  On the Computational Power of Neural Nets , 1995, J. Comput. Syst. Sci..

[39]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.