Symbolic Knowledge Representation in Recurrent Neural Networks: Insights from Theoretical Models of

We give an overview of some of the fundamental issues found in the realm of recurrent neural networks. We use theoretical models of computation to characterize the representational, computational, and learning capabilitities of recurrent network models. We discuss how results derived for deterministic models can be generalized to fuzzy models. We then address how these theoretical models can be utilized within the knowledge-based neurocomputing paradigm for training recurrent networks, for extracting symbolic knowledge from trained networks, and for improving network training and generalization performance by making eeective use of prior knowledge about a problem domain. 3.1 Introduction This chapter addresses some fundamental issues in regard to recurrent neural network architectures and learning algorithms, their computational power, their suitability for diierent classes of applications, and their ability to acquire symbolic knowledge through learning. We have found it convenient to investigate some of those issues in the paradigm of theoretical models of computation, formal languages, and dynamical systems theory. We will brieey outline some of the issues we discuss in this chapter. Neural networks were for a long time considered to belong outside the realm of mainstream artiicial intelligence. The development of powerful new architectures

[1]  Robert B. Allen,et al.  Connectionist Language Users , 1990 .

[2]  Peter Tiño,et al.  Learning long-term dependencies in NARX recurrent neural networks , 1996, IEEE Trans. Neural Networks.

[3]  C. Lee Giles,et al.  Training Second-Order Recurrent Neural Networks using Hints , 1992, ML.

[4]  King-Sun Fu,et al.  Syntactic Pattern Recognition And Applications , 1968 .

[5]  Raymond L. Watrous,et al.  Induction of Finite-State Languages Using Second-Order Recurrent Networks , 1992, Neural Computation.

[6]  J. Grantner,et al.  Synthesis and analysis of fuzzy logic finite state machine models , 1994, Proceedings of 1994 IEEE 3rd International Fuzzy Systems Conference.

[7]  C. Lee Giles,et al.  Constructive learning of recurrent neural networks: limitations of recurrent cascade correlation and a simple solution , 1995, IEEE Trans. Neural Networks.

[8]  C. L. Giles,et al.  Inserting rules into recurrent neural networks , 1992, Neural Networks for Signal Processing II Proceedings of the 1992 IEEE Workshop.

[9]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[10]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[11]  Stefan C. Kremer,et al.  On the computational power of Elman-style recurrent networks , 1995, IEEE Trans. Neural Networks.

[12]  Alberto Sanfeliu,et al.  An Algebraic Framework to Represent Finite State Machines in Single-Layer Recurrent Neural Networks , 1995, Neural Computation.

[13]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[14]  Alberto Sanfeliu,et al.  Understanding Neural Networks for Grammatical Inference and Recognition , 1993 .

[15]  Jürgen Schmidhuber,et al.  Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.

[16]  C. L. Giles,et al.  Heuristics for the extraction of rules from discrete-time recurrent neural networks , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[17]  C. Lee Giles,et al.  How embedded memory in recurrent neural network architectures helps learning long-term temporal dependencies , 1998, Neural Networks.

[18]  J. Taylor,et al.  Switching and finite automata theory, 2nd ed. , 1980, Proceedings of the IEEE.

[19]  Michael J. Pazzani Detecting and Correcting Errors of Omission After Explanation-Based Learning , 1989, IJCAI.

[20]  Peter Tiňo,et al.  Finite State Machines and Recurrent Neural Networks -- Automata and Dynamical Systems Approaches , 1995 .

[21]  Etienne Deprit Implementing recurrent back-propagation on the connection machine , 1989, Neural Networks.

[22]  Kevin J. Lang Random DFA's can be approximately learned from sparse uniform examples , 1992, COLT '92.

[23]  C. Lee Giles,et al.  Rule Revision With Recurrent Neural Networks , 1996, IEEE Trans. Knowl. Data Eng..

[24]  Carl H. Smith,et al.  Inductive Inference: Theory and Methods , 1983, CSUR.

[25]  Michael C. Mozer,et al.  Discovering the Structure of a Reactive Environment by Exploration , 1990, Neural Computation.

[26]  C. Lee Giles,et al.  Extraction of rules from discrete-time recurrent neural networks , 1996, Neural Networks.

[27]  Noga Alon,et al.  Efficient simulation of finite automata by neural nets , 1991, JACM.

[28]  Giovanni Soda,et al.  Unified Integration of Explicit Knowledge and Learning by Example in Recurrent Networks , 1995, IEEE Trans. Knowl. Data Eng..

[29]  Colin Giles,et al.  Learning Context-free Grammars: Capabilities and Limitations of a Recurrent Neural Network with an External Stack Memory (cid:3) , 1992 .

[30]  Giovanni Soda,et al.  Recurrent neural networks and prior knowledge for sequence processing: a constrained nondeterministic approach , 1995, Knowl. Based Syst..

[31]  Garrison W. Cottrell,et al.  Time-delay neural networks: representation and induction of finite-state machines , 1997, IEEE Trans. Neural Networks.

[32]  Karvel K. Thornber,et al.  Fuzzy finite-state automata can be deterministically encoded into recurrent neural networks , 1998, IEEE Trans. Fuzzy Syst..

[33]  Jude W. Shavlik,et al.  Constructive Induction in Knowledge-Based Neural Networks , 1991, ML.

[34]  Jude W. Shavlik,et al.  Knowledge-Based Artificial Neural Networks , 1994, Artif. Intell..

[35]  Scott E. Fahlman,et al.  The Recurrent Cascade-Correlation Architecture , 1990, NIPS.

[36]  J. Feldman,et al.  Learning Automata from Ordered Examples , 1991 .

[37]  Y. C. Lee,et al.  Turing equivalence of neural networks with second order connection weights , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[38]  Yoichi Hayashi,et al.  Fuzzy neural expert system with automated extraction of fuzzy If-Then rules from a trained neural network , 1990, [1990] Proceedings. First International Symposium on Uncertainty Modeling and Analysis.

[39]  Hamid R. Berenji,et al.  Refinement of Approximate Reasoning-based Controllers by Reinforcement Learning , 1991, ML.

[40]  Giovanni Soda,et al.  An unified approach for integrating explicit knowledge and learning by example in recurrent networks , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[41]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[42]  C. L. Giles,et al.  Pruning recurrent neural networks for improved generalization performance , 1994, Proceedings of IEEE Workshop on Neural Networks for Signal Processing.

[43]  H. Carter Fuzzy Sets and Systems — Theory and Applications , 1982 .

[44]  Allen Ginsberg,et al.  Theory Revision via Prior Operationalization , 1988, AAAI.

[45]  C. Lee Giles,et al.  Learning and Extracting Finite State Automata with Second-Order Recurrent Neural Networks , 1992, Neural Computation.

[46]  Sandiway Fong,et al.  Natural language grammatical inference: a comparison of recurrent neural networks and machine learning methods , 1995, Learning for Natural Language Processing.

[47]  C. Lee Giles,et al.  Higher Order Recurrent Networks and Grammatical Inference , 1989, NIPS.

[48]  M. Goudreau,et al.  First-order vs. Second-order Single Layer Recurrent Neural Networks , 1994 .

[49]  C. Lee Giles,et al.  Constructing deterministic finite-state automata in recurrent neural networks , 1996, JACM.

[50]  Volker Tresp,et al.  Network Structuring and Training Using Rule-Based Knowledge , 1992, NIPS.

[51]  LiMin Fu,et al.  Rule Generation from Neural Networks , 1994, IEEE Trans. Syst. Man Cybern. Syst..

[52]  Michael C. Mozer,et al.  Skeletonization: A Technique for Trimming the Fat from a Network via Relevance Assessment , 1988, NIPS.

[53]  I. Noda,et al.  A learning method for recurrent networks based on minimization of finite automata , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[54]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[55]  C. Lee Giles,et al.  Extracting and Learning an Unknown Grammar with Recurrent Neural Networks , 1991, NIPS.

[56]  Stefan C. Kremer,et al.  Comments on "Constructive learning of recurrent neural networks: limitations of recurrent cascade correlation and a simple solution" , 1996, IEEE Trans. Neural Networks.

[57]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[58]  Simon M. Lucas,et al.  Syntactic Neural Networks , 1990 .

[59]  F. A. Unal,et al.  A fuzzy finite state machine implementation based on a neural fuzzy system , 1994, Proceedings of 1994 IEEE 3rd International Fuzzy Systems Conference.

[60]  James P. Crutchfield,et al.  Computation at the Onset of Chaos , 1991 .

[61]  Horst Bunke,et al.  Syntactic and Structural Pattern Recognition , 1988, NATO ASI Series.

[62]  Colin Giles,et al.  Learning, invariance, and generalization in high-order neural networks. , 1987, Applied optics.

[63]  Steven C. Suddarth,et al.  Symbolic-Neural Systems and the Use of Hints for Developing Complex Systems , 1991, Int. J. Man Mach. Stud..

[64]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[65]  Hava T. Siegelmann,et al.  On the Computational Power of Neural Nets , 1995, J. Comput. Syst. Sci..

[66]  Alessandro Sperduti,et al.  On the Computational Power of Recurrent Neural Networks for Structures , 1997, Neural Networks.

[67]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[68]  Irving S. Reed,et al.  Including Hints in Training Neural Nets , 1991, Neural Computation.

[69]  Geoffrey E. Hinton,et al.  A time-delay neural network architecture for isolated word recognition , 1990, Neural Networks.

[70]  Yaser S. Abu-Mostafa,et al.  Learning from hints in neural networks , 1990, J. Complex..

[71]  Mikel L. Forcada,et al.  Learning the Initial State of a Second-Order Recurrent Neural Network during Regular-Language Inference , 1995, Neural Computation.

[72]  Richard Maclin,et al.  Refining algorithms with knowledge-based neural networks: improving the Chou-Fasman algorithm for protein folding , 1994, COLT 1994.

[73]  LiMin Fu Learning capacity and sample complexity on expert networks , 1996, IEEE Trans. Neural Networks.

[74]  Michael, G. Thomason,et al.  Deterministic Acceptors of Regular Fuzzy Languages , 1974, IEEE Trans. Syst. Man Cybern..

[75]  Jordan B. Pollack,et al.  Recursive Distributed Representations , 1990, Artif. Intell..

[76]  Padhraic Smyth,et al.  Self-clustering recurrent networks , 1993, IEEE International Conference on Neural Networks.

[77]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[78]  James L. McClelland,et al.  Finite State Automata and Simple Recurrent Networks , 1989, Neural Computation.

[79]  W. A. Rosenblith Information and Control in Organ Systems , 1959 .

[80]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .