Learning a class of large finite state machines with a recurrent neural network

Abstract One of the issues in any learning model is how it scales with problem size. The problem of learning finite state machine (FSMs) from examples with recurrent neural networks has been extensively explored. However, these results are somewhat disappointing in the sense that the machines that can be learned are too small to be competitive with existing grammatical inference algorithms. We show that a type of recurrent neural network (Narendra & Parthasarathy, 1990, IEEE Trans. Neural Networks, 1, 4–27) which has feedback but no hidden state neurons can learn a special type of FSM called a finite memory machine (FMM) under certain constraints. These machines have a large number of states (simulations are for 256 and 512 state FMMs) but have minimal order, relatively small depth and little logic when the FMM is implemented as a sequential machine.

[1]  Padhraic Smyth,et al.  Discrete recurrent neural networks for grammatical inference , 1994, IEEE Trans. Neural Networks.

[2]  Anthony J. Robinson,et al.  Static and Dynamic Error Propagation Networks with Application to Speech Coding , 1987, NIPS.

[3]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Anders Krogh,et al.  A Simple Weight Decay Can Improve Generalization , 1991, NIPS.

[5]  Taylor L. Booth,et al.  Grammatical Inference: Introduction and Survey - Part I , 1975, IEEE Trans. Syst. Man Cybern..

[6]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[7]  Stefan C. Kremer,et al.  Finite State Automata that Recurrent Cascade-Correlation Cannot Represent , 1995, NIPS.

[8]  Jing Peng,et al.  An Efficient Gradient-Based Algorithm for On-Line Training of Recurrent Network Trajectories , 1990, Neural Computation.

[9]  Richard S. Sutton,et al.  Connectionist Learning for Control , 1995 .

[10]  Richard S. Sutton,et al.  Neural networks for control , 1990 .

[11]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[12]  Michael I. Jordan Attractor dynamics and parallelism in a connectionist sequential machine , 1990 .

[13]  Stephen A. Billings,et al.  Properties of neural networks with applications to modelling non-linear dynamical systems , 1992 .

[14]  Ronald L. Rivest,et al.  Training a 3-node neural network is NP-complete , 1988, COLT '88.

[15]  R. R. Leighton,et al.  The autoregressive backpropagation algorithm , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[16]  Ronald J. Williams,et al.  Gradient-based learning algorithms for recurrent connectionist networks , 1990 .

[17]  Kumpati S. Narendra,et al.  Identification and control of dynamical systems using neural networks , 1990, IEEE Trans. Neural Networks.

[18]  Raymond L. Watrous,et al.  Induction of Finite-State Languages Using Second-Order Recurrent Networks , 1992, Neural Computation.

[19]  Ron Sun,et al.  Integrating rules and connectionism for robust commonsense reasoning , 1994, Sixth-generation computer technology series.

[20]  C. Lee Giles,et al.  Constructive learning of recurrent neural networks: limitations of recurrent cascade correlation and a simple solution , 1995, IEEE Trans. Neural Networks.

[21]  Peter Tiño,et al.  Learning and Extracting Initial Mealy Automata with a Modular Neural Network Model , 1995, Neural Comput..

[22]  DANA ANGLUIN,et al.  On the Complexity of Minimum Inference of Regular Sets , 1978, Inf. Control..

[23]  C. Lee Giles,et al.  Learning and Extracting Finite State Automata with Second-Order Recurrent Neural Networks , 1992, Neural Computation.

[24]  Ah Chung Tsoi,et al.  Locally recurrent globally feedforward networks: a critical review of architectures , 1994, IEEE Trans. Neural Networks.

[25]  J. Taylor,et al.  Switching and finite automata theory, 2nd ed. , 1980, Proceedings of the IEEE.

[26]  C. Lee Giles,et al.  Higher Order Recurrent Networks and Grammatical Inference , 1989, NIPS.

[27]  Giovanni Soda,et al.  Local Feedback Multilayered Networks , 1992, Neural Computation.

[28]  Pierre Roussel-Ragot,et al.  Neural Networks and Nonlinear Adaptive Filtering: Unifying Concepts and New Algorithms , 1993, Neural Computation.

[29]  Hava T. Siegelmann,et al.  On the Computational Power of Neural Nets , 1995, J. Comput. Syst. Sci..

[30]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[31]  K. P. Unnikrishnan,et al.  Nonlinear prediction of speech signals using memory neuron networks , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.

[32]  Anil Nerode,et al.  Multiple Agent Hybrid Control Architecture , 1992, Hybrid Systems.

[33]  John W. Brewer,et al.  Application of Optimal Control and Optimal Regulator Theory to the ``Integrated'' Control of Insect Pests , 1975, IEEE Transactions on Systems, Man, and Cybernetics.

[34]  A. Lapedes,et al.  Nonlinear Signal Processing Using Neural Networks , 1987 .

[35]  Boris A. Trakhtenbrot,et al.  Finite automata : behavior and synthesis , 1973 .

[36]  F. Fallside,et al.  Neural networks for signal processing : proceedings of the 1991 IEEE workshop , 1991 .

[37]  Andrew G. Barto,et al.  Connectionist learning for control , 1990 .

[38]  Anil Nerode,et al.  Models for Hybrid Systems: Automata, Topologies, Controllability, Observability , 1992, Hybrid Systems.

[39]  Kevin J. Lang Random DFA's can be approximately learned from sparse uniform examples , 1992, COLT '92.

[40]  Michael C. Mozer,et al.  Discovering the Structure of a Reactive Environment by Exploration , 1990, Neural Computation.

[41]  Eduardo Sontag,et al.  Computational power of neural networks , 1995 .

[42]  LiMin Fu,et al.  Neural networks in computer intelligence , 1994 .

[43]  José Carlos Príncipe,et al.  The gamma model--A new neural model for temporal processing , 1992, Neural Networks.

[44]  Taylor L. Booth,et al.  Grammatical Inference: Introduction and Survey-Part I , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Eric A. Wan,et al.  Time series prediction by using a connectionist network with internal delay lines , 1993 .

[46]  P. Ashar,et al.  Sequential Logic Synthesis , 1991 .

[47]  Ah Chung Tsoi,et al.  FIR and IIR Synapses, a New Neural Network Architecture for Time Series Modeling , 1991, Neural Computation.

[48]  J. Stephen Judd,et al.  Neural network design and the complexity of learning , 1990, Neural network modeling and connectionism.

[49]  A. Lapedes,et al.  Nonlinear signal processing using neural networks: Prediction and system modelling , 1987 .

[50]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[51]  James L. McClelland,et al.  Finite State Automata and Simple Recurrent Networks , 1989, Neural Computation.

[52]  Les E. Atlas,et al.  Recurrent neural networks and robust time series prediction , 1994, IEEE Trans. Neural Networks.

[53]  P. S. Sastry,et al.  Memory neuron networks for identification and control of dynamical systems , 1994, IEEE Trans. Neural Networks.

[54]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[55]  Geoffrey E. Hinton,et al.  A time-delay neural network architecture for isolated word recognition , 1990, Neural Networks.