Pruning recurrent neural networks for improved generalization performance

The experimental results in this paper demonstrate that a simple pruning/retraining method effectively improves the generalization performance of recurrent neural networks trained to recognize regular languages. The technique also permits the extraction of symbolic knowledge in the form of deterministic finite-state automata (DFA) which are more consistent with the rules to be learned. Weight decay has also been shown to improve a network's generalization performance. Simulations with two small DFA (/spl les/10 states) and a large finite-memory machine (64 states) demonstrate that the performance improvement due to pruning/retraining is generally superior to the improvement due to training with weight decay. In addition, there is no need to guess a 'good' decay rate.<<ETX>>

[1]  J. Taylor,et al.  Switching and finite automata theory, 2nd ed. , 1980, Proceedings of the IEEE.

[2]  Geoffrey E. Hinton Learning Translation Invariant Recognition in Massively Parallel Networks , 1987, PARLE.

[3]  James L. McClelland,et al.  Finite State Automata and Simple Recurrent Networks , 1989, Neural Computation.

[4]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[5]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[6]  Giovanni Soda,et al.  An unified approach for integrating explicit knowledge and learning by example in recurrent networks , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[7]  C. Lee Giles,et al.  Learning and Extracting Finite State Automata with Second-Order Recurrent Neural Networks , 1992, Neural Computation.

[8]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[9]  Raymond L. Watrous,et al.  Induction of Finite-State Languages Using Second-Order Recurrent Networks , 1992, Neural Computation.

[10]  C. L. Giles,et al.  Inserting rules into recurrent neural networks , 1992, Neural Networks for Signal Processing II Proceedings of the 1992 IEEE Workshop.

[11]  Jude Shavlik,et al.  A Framework for Combining Symbolic and Neural Learning , 1992 .

[12]  Padhraic Smyth,et al.  Learning Finite State Machines With Self-Clustering Recurrent Networks , 1993, Neural Computation.

[13]  C. Lee Giles,et al.  Extraction, Insertion and Refinement of Symbolic Rules in Dynamically Driven Recurrent Neural Networks , 1993 .

[14]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.