Training of a discrete recurrent neural network for sequence classification by using a helper FNN

This research is concerned with a gradient descent training algorithm for a target network that makes use of a helper feed-forward network (FFN) to represent the cost function required for training the target network. A helper FFN is trained because the cost relation for the target is not differentiable. The transfer function of the trained helper FFN provides a differentiable cost function of the parameter vector for the target network allowing gradient search methods for finding the optimum values of the parameters. The method is applied to the training of discrete recurrent networks (DRNNs) that are used as a tool for classification of temporal sequences of characters from some alphabet and identification of a finite state machine (FSM) that may have produced all the sequences. Classification of sequences that are input to the DRNN is based on the terminal state of the network after the last element in the input sequence has been processed. If the DRNN is to be used for classifying sequences the terminal states for class 0 sequences must be distinct from the terminal states for class 1 sequences. The cost value to be used in training must therefore be a function of this disjointedness and no more. The outcome of this is a cost relationship that is not continuous but discrete and therefore derivative free methods have to be used or alternatively the method suggested in this paper. In the latter case the transform function of the helper FFN that is trained using the cost function is a differentiable function that can be used in the training of the DRNN using gradient descent.

[1]  P. Henrique,et al.  New improvements on the real-time recurrent learning algorithm , 1997, Proceedings of IEEE International Symposium on Information Theory.

[2]  Armando Blanco,et al.  Extracting rules from a (fuzzy/crisp) recurrent neural network using a self‐organizing map , 2000 .

[3]  C. Lee Giles,et al.  Stable Encoding of Large Finite-State Automata in Recurrent Neural Networks with Sigmoid Discriminants , 1996, Neural Computation.

[4]  C. L. Giles,et al.  Constructing deterministic finite-state automata in sparse recurrent neural networks , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[5]  Srimat T. Chakradhar,et al.  First-order versus second-order single-layer recurrent neural networks , 1994, IEEE Trans. Neural Networks.

[6]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[7]  C. Lee Giles,et al.  Learning and Extracting Finite State Automata with Second-Order Recurrent Neural Networks , 1992, Neural Computation.

[8]  Norman Thomson,et al.  J - The Natural Language for Analytic Computing , 2001 .

[9]  Luís B. Almeida,et al.  A learning rule for asynchronous perceptrons with feedback in a combinatorial environment , 1990 .

[10]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[11]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[12]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[14]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Panagiotis Manolios,et al.  First-Order Recurrent Neural Networks and Deterministic Finite State Automata , 1994, Neural Computation.

[16]  Fernando J. Pineda,et al.  Generalization of Back propagation to Recurrent and Higher Order Neural Networks , 1987, NIPS.

[17]  Raymond L. Watrous,et al.  Induction of Finite-State Languages Using Second-Order Recurrent Networks , 1992, Neural Computation.