The Neural Network Pushdown Automaton: Architecture, Dynamics and Training

Recurrent neural networks are dynamical network structures which have the capabilities of processing and generating temporal information. To our knowledge the earliest neural network model that processed temporal information was that of MeCulloch and Pitts [McCulloch43]. Kleene [Kleene56] extended this work to show the equivalence of finite automata and McCulloch and Pitts' representation of nerve net activity. Minsky [Minsky67] showed that any hard-threshold neural network could represent a finite state automata and developed a method for actually constructing a neural network finite state automata. However, many different neural network models can be defined as recurrent; for example see [Grossberg82] and [Hopfield82]. Our focus is on discrete-time recurrent neural networks that dynamically process temporal information and follows in the tradition of dynamically (nonautonomous) recurrent network models defined by [Elman90, Jordan86, Narendra90, Pollack91,Tsoi94]. In particular this paper develops a new model, a neural network pushdown automaton (NNPDA), which is a hybrid system that couples a recurrent network to an external stack memory. More importantly, a NNPDA should be capable of learning and recognizing some class of context-free grammars. As such, this model is a significant extension of previous work where neural network finite state automata simulated and learned regular grammars. We explore the capabilities of such a model by inferring automata from sample strings the problem of grammatical inference. It is important to note that our focus is only on that of inference, not of prediction or translation. We will be concerned with problem of inferring an unknown system model based on observing sample strings and not on predicting the next string dement in a sequence. In some ways, our problem can be thought of as one of system identification [Ljung87].

[1]  S C Kleene,et al.  Representation of Events in Nerve Nets and Finite Automata , 1951 .

[2]  Demetri Psaltis,et al.  Higher order associative memories and their optical implementations , 1988, Neural Networks.

[3]  Paolo Frasconi,et al.  Computational capabilities of local-feedback recurrent networks acting as finite-state machines , 1996, IEEE Trans. Neural Networks.

[4]  Mikel L. Forcada,et al.  Second-Order Recurrent Neural Networks Can Learn Regular Grammars from Noisy Strings , 1995, IWANN.

[5]  C. Lee Giles,et al.  Extraction of rules from discrete-time recurrent neural networks , 1996, Neural Networks.

[6]  C. Lee Giles,et al.  Learning and Extracting Finite State Automata with Second-Order Recurrent Neural Networks , 1992, Neural Computation.

[7]  Janet Wiles,et al.  Learning to count without a counter: A case study of dynamics and activation landscapes in recurrent networks , 1995 .

[8]  C. Lee Giles,et al.  Higher Order Recurrent Networks and Grammatical Inference , 1989, NIPS.

[9]  Michael C. Mozer,et al.  A Connectionist Symbol Manipulator that Discovers the Structure of Context-Free Languages , 1992, NIPS.

[10]  Kumpati S. Narendra,et al.  Identification and control of dynamical systems using neural networks , 1990, IEEE Trans. Neural Networks.

[11]  Wolfgang Maass,et al.  Lower Bounds for the Computational Power of Networks of Spiking Neurons , 1996, Neural Computation.

[12]  Mike Casey,et al.  The Dynamics of Discrete-Time Computation, with Application to Recurrent Neural Networks and Finite State Machine Extraction , 1996, Neural Computation.

[13]  Y. C. Lee,et al.  Turing equivalence of neural networks with second order connection weights , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[14]  C. Lee Giles,et al.  Experimental Comparison of the Effect of Order in Recurrent Neural Networks , 1993, Int. J. Pattern Recognit. Artif. Intell..

[15]  Vasant Honavar,et al.  A neural-network architecture for syntax analysis , 1999, IEEE Trans. Neural Networks.

[16]  C. Lee Giles,et al.  Extraction, Insertion and Refinement of Symbolic Rules in Dynamically Driven Recurrent Neural Networks , 1993 .

[17]  Jordan B. Pollack,et al.  Recursive Distributed Representations , 1990, Artif. Intell..

[18]  Raymond L. Watrous,et al.  Induction of Finite-State Languages Using Second-Order Recurrent Networks , 1992, Neural Computation.

[19]  Marvin Minsky,et al.  Computation : finite and infinite machines , 2016 .

[20]  C. Lee Giles,et al.  Constructive learning of recurrent neural networks: limitations of recurrent cascade correlation and a simple solution , 1995, IEEE Trans. Neural Networks.

[21]  M. Goudreau,et al.  First-order vs. Second-order Single Layer Recurrent Neural Networks , 1994 .

[22]  Padhraic Smyth,et al.  Discrete recurrent neural networks for grammatical inference , 1994, IEEE Trans. Neural Networks.

[23]  Paulo J. G. Lisboa,et al.  Translation, rotation, and scale invariant pattern recognition by high-order neural networks and moment classifiers , 1992, IEEE Trans. Neural Networks.

[24]  Ronald J. Williams,et al.  Gradient-based learning algorithms for recurrent networks and their computational complexity , 1995 .

[25]  Garrison W. Cottrell,et al.  Time-delay neural networks: representation and induction of finite-state machines , 1997, IEEE Trans. Neural Networks.

[26]  C. Lee Giles,et al.  Using recurrent neural networks to learn the structure of interconnection networks , 1995, Neural Networks.

[27]  W. Pitts,et al.  A Logical Calculus of the Ideas Immanent in Nervous Activity (1943) , 2021, Ideas That Created the Future.

[28]  Lennart Ljung,et al.  System Identification: Theory for the User , 1987 .

[29]  Ah Chung Tsoi,et al.  Locally recurrent globally feedforward networks: a critical review of architectures , 1994, IEEE Trans. Neural Networks.

[30]  Carl H. Smith,et al.  Inductive Inference: Theory and Methods , 1983, CSUR.

[31]  Stephen Grossberg,et al.  Classical and Instrumental Learning by Neural Networks , 1982 .

[32]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[33]  C. L. Giles,et al.  Machine learning using higher order correlation networks , 1986 .

[34]  Geoffrey E. Hinton,et al.  A general framework for parallel distributed processing , 1986 .

[35]  Simon M. Lucas,et al.  Syntactic Neural Networks , 1990 .

[36]  Michael I. Jordan Attractor dynamics and parallelism in a connectionist sequential machine , 1990 .

[37]  O. Firschein,et al.  Syntactic pattern recognition and applications , 1983, Proceedings of the IEEE.

[38]  E. Mark Gold,et al.  Complexity of Automaton Identification from Given Data , 1978, Inf. Control..

[39]  Hava T. Siegelmann,et al.  Computational capabilities of recurrent NARX neural networks , 1997, IEEE Trans. Syst. Man Cybern. Part B.

[40]  Joydeep Ghosh,et al.  Efficient Higher-Order Neural Networks for Classification and Function Approximation , 1992, Int. J. Neural Syst..

[41]  守屋 悦朗,et al.  J.E.Hopcroft, J.D. Ullman 著, "Introduction to Automata Theory, Languages, and Computation", Addison-Wesley, A5変形版, X+418, \6,670, 1979 , 1980 .

[42]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[43]  Padhraic Smyth,et al.  Learning Finite State Machines With Self-Clustering Recurrent Networks , 1993, Neural Computation.

[44]  Peter Tiño,et al.  Learning and Extracting Initial Mealy Automata with a Modular Neural Network Model , 1995, Neural Comput..

[45]  Robert B. Allen,et al.  Connectionist Language Users , 1990 .

[46]  Pierre Roussel-Ragot,et al.  Neural Networks and Nonlinear Adaptive Filtering: Unifying Concepts and New Algorithms , 1993, Neural Computation.

[47]  C. Lee Giles,et al.  Learning a class of large finite state machines with a recurrent neural network , 1995, Neural Networks.

[48]  C. Lee Giles,et al.  Extracting and Learning an Unknown Grammar with Recurrent Neural Networks , 1991, NIPS.

[49]  Karvel K. Thornber,et al.  Fuzzy finite-state automata can be deterministically encoded into recurrent neural networks , 1998, IEEE Trans. Fuzzy Syst..

[50]  C. Lee Giles,et al.  Constructing deterministic finite-state automata in recurrent neural networks , 1996, JACM.

[51]  W S McCulloch,et al.  A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.

[52]  Cristopher Moore,et al.  Dynamical Recognizers: Real-Time Language Recognition by Analog Computers , 1998, Theor. Comput. Sci..

[53]  I. Noda,et al.  A learning method for recurrent networks based on minimization of finite automata , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[54]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[55]  Alberto Sanfeliu,et al.  Understanding Neural Networks for Grammatical Inference and Recognition , 1993 .

[56]  Alessandro Sperduti,et al.  Stability properties of labeling recursive auto-associative memory , 1995, IEEE Trans. Neural Networks.

[57]  Stefan C. Kremer,et al.  Finite State Automata that Recurrent Cascade-Correlation Cannot Represent , 1995, NIPS.

[58]  James P. Crutchfield,et al.  Computation at the Onset of Chaos , 1991 .

[59]  Yoh-Han Pao,et al.  Adaptive pattern recognition and neural networks , 1989 .

[60]  Srimat T. Chakradhar,et al.  First-order versus second-order single-layer recurrent neural networks , 1994, IEEE Trans. Neural Networks.

[61]  Kevin J. Lang Random DFA's can be approximately learned from sparse uniform examples , 1992, COLT '92.

[62]  Michael C. Mozer,et al.  A Unified Gradient-Descent/Clustering Architecture for Finite State Machine Induction , 1993, NIPS.

[63]  Michael C. Mozer,et al.  Discovering the Structure of a Reactive Environment by Exploration , 1990, Neural Computation.

[64]  Joachim Diederich,et al.  The truth will come to light: directions and challenges in extracting the knowledge embedded within trained artificial neural networks , 1998, IEEE Trans. Neural Networks.

[65]  Stephen Grossberg,et al.  Studies of mind and brain , 1982 .

[66]  Michael A. Arbib,et al.  An Introduction to Formal Language Theory , 1988, Texts and Monographs in Computer Science.

[67]  Richard D. Braatz,et al.  On the "Identification and control of dynamical systems using neural networks" , 1997, IEEE Trans. Neural Networks.

[68]  Hava T. Siegelmann,et al.  On the Computational Power of Neural Nets , 1995, J. Comput. Syst. Sci..

[69]  Alessandro Sperduti,et al.  On the Computational Power of Recurrent Neural Networks for Structures , 1997, Neural Networks.

[70]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[71]  Giovanni Soda,et al.  Unified Integration of Explicit Knowledge and Learning by Example in Recurrent Networks , 1995, IEEE Trans. Knowl. Data Eng..

[72]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[73]  Hava T. Siegelmann,et al.  On the computational power of neural nets , 1992, COLT '92.

[74]  James L. McClelland,et al.  Finite State Automata and Simple Recurrent Networks , 1989, Neural Computation.

[75]  Jeffrey L. Elman,et al.  Distributed Representations, Simple Recurrent Networks, and Grammatical Structure , 1991, Mach. Learn..

[76]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .