Stable behavior in a recurrent neural network for a finite state machine

For the learning of a finite state machine (FSM) by a recurrent neural network (RNN), we think about how to train an RNN so as to stably mimic an FSM even for sequences having a long length. First, we consider the relationship between the stable behavior and the internal representation of states, that is, clusters of the internal units' outputs. As for this relationship, we prove that an RNN can get the stable cluster transitions when a neuron activation parameter is larger than a certain finite value micro0. Secondly, to acquire the stable behavior, we regard the internal representation for the stable behavior as prior knowledge. This produces a new target function of learning with internal representation term. We derive a Bayesian style method to estimate coefficients of the terms in the function, corresponding to hyperparameters. Finally, experiments show that RNNs readily acquire stable behavior by using our proposed method.

[1]  Ryohei Nakano,et al.  Adaptive β Scheduling Learning Method of Finite State Automata by Recurrent Neural Networks , 1997, ICONIP.

[2]  Ryohei Nakano,et al.  Annealed RNN Learning of Finite State Automata , 1996, ICANN.

[3]  Mikel L. Forcada,et al.  Stable Encoding of Finite-State Machines in Discrete-Time Recurrent Neural Nets with Sigmoid Units , 2000, Neural Computation.

[4]  Raymond L. Watrous,et al.  Induction of Finite-State Languages Using Second-Order Recurrent Networks , 1992, Neural Computation.

[5]  A. M. Walker On the Asymptotic Behaviour of Posterior Distributions , 1969 .

[6]  C. Lee Giles,et al.  Stable Encoding of Large Finite-State Automata in Recurrent Neural Networks with Sigmoid Discriminants , 1996, Neural Computation.

[7]  Panagiotis Manolios,et al.  First-Order Recurrent Neural Networks and Deterministic Finite State Automata , 1994, Neural Computation.

[8]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[9]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[10]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[11]  Marvin Minsky,et al.  Computation : finite and infinite machines , 2016 .

[12]  Jordan B. Pollack,et al.  The induction of dynamical recognizers , 1991, Machine Learning.

[13]  Geoffrey E. Hinton Learning Translation Invariant Recognition in Massively Parallel Networks , 1987, PARLE.

[14]  Mike Casey,et al.  The Dynamics of Discrete-Time Computation, with Application to Recurrent Neural Networks and Finite State Machine Extraction , 1996, Neural Computation.

[15]  Alberto Sanfeliu,et al.  An Algebraic Framework to Represent Finite State Machines in Single-Layer Recurrent Neural Networks , 1995, Neural Computation.

[16]  Mikel L. Forcada,et al.  Learning the Initial State of a Second-Order Recurrent Neural Network during Regular-Language Inference , 1995, Neural Computation.

[17]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[18]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[19]  Peter Tiño,et al.  Learning and Extracting Initial Mealy Automata with a Modular Neural Network Model , 1995, Neural Comput..

[20]  Padhraic Smyth,et al.  Learning Finite State Machines With Self-Clustering Recurrent Networks , 1993, Neural Computation.

[21]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[22]  Srimat T. Chakradhar,et al.  First-order versus second-order single-layer recurrent neural networks , 1994, IEEE Trans. Neural Networks.

[23]  C. Lee Giles,et al.  Constructing deterministic finite-state automata in recurrent neural networks , 1996, JACM.

[24]  Jordan B. Pollack,et al.  Analysis of Dynamical Recognizers , 1997, Neural Computation.

[25]  Michael C. Mozer,et al.  A Unified Gradient-Descent/Clustering Architecture for Finite State Machine Induction , 1993, NIPS.

[26]  Philip E. Gill,et al.  Practical optimization , 1981 .

[27]  James L. McClelland,et al.  Finite State Automata and Simple Recurrent Networks , 1989, Neural Computation.

[28]  Michael C. Mozer,et al.  Dynamic On-line Clustering and State Extraction: An Approach to Symbolic Learning , 1998, Neural Networks.

[29]  C. Lee Giles,et al.  Learning and Extracting Finite State Automata with Second-Order Recurrent Neural Networks , 1992, Neural Computation.