Injecting Nondeterministic Finite State Automata into RecurrentNeural NetworksPaolo

In this paper we propose a method for injecting time-warping nondeterministic nite state automata into recurrent neural networks. The proposed algorithm takes as input a set of automata transition rules and produces a recurrent architecture. The resulting connection weights are speciied by means of linear constraints. In this way, the network is guaranteed to carry out the assigned automata rules, provided the weights belong to the constrained domain and the inputs belong to an appropriate range of values, making possible a boolean interpretation. In a subsequent phase, the weights can be adapted in order to obtain the desired behavior on corrupted inputs, using learning from examples. One of the main concerns of the proposed neural model is that it is no longer focussed exclusively on learning, but also on the identiication of signiicant architectural and weight constraints derived systematically from automata rules, representing the partial domain knowledge on a given problem. The ability of learning from examples is certainly the most appealing feature of neural networks. In the last few years, several researchers have used connectionist models for solving diierent kinds of problems ranging from robot control to pattern recognition. Coping with optimization of functions with several thousands of variables is quite common. Surprisingly, in many practical cases, global or near-global optimization is attained with rough numerical methods. For example, successful applications of neural nets for recognition of handwritten characters (e.g.: 18]) and for phoneme discrimination (e.g. 31]) have been proposed which do not report serious convergence problems. Some attempts to understand the theoretical reasons for the successes and failures of supervised learning schemes, and particular of Backpropagation, 19], 27],,23],,34] have been carried out 16],,11] which explain when such schemes are likely to succeed. These results should not 1

[1]  C. L. Giles,et al.  Inserting rules into recurrent neural networks , 1992, Neural Networks for Signal Processing II Proceedings of the 1992 IEEE Workshop.

[2]  C. Lee Giles,et al.  Training Second-Order Recurrent Neural Networks using Hints , 1992, ML.

[3]  Yoshua Bengio,et al.  Learning the dynamic nature of speech with back-propagation for sequences , 1992, Pattern Recognit. Lett..

[4]  Raymond L. Watrous,et al.  Induction of Finite-State Languages Using Second-Order Recurrent Networks , 1992, Neural Computation.

[5]  C. Lee Giles,et al.  Learning and Extracting Finite State Automata with Second-Order Recurrent Neural Networks , 1992, Neural Computation.

[6]  Paulo J. G. Lisboa,et al.  Translation, rotation, and scale invariant pattern recognition by high-order neural networks and moment classifiers , 1992, IEEE Trans. Neural Networks.

[7]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[8]  Irving S. Reed,et al.  Including Hints in Training Neural Nets , 1991, Neural Computation.

[9]  Giovanni Soda,et al.  An unified approach for integrating explicit knowledge and learning by example in recurrent networks , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[10]  Yaser S. Abu-Mostafa,et al.  Learning from hints in neural networks , 1990, J. Complex..

[11]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[12]  James L. McClelland,et al.  Finite State Automata and Simple Recurrent Networks , 1989, Neural Computation.

[13]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[14]  Barak A. Pearlmutter Learning State Space Trajectories in Recurrent Neural Networks , 1989, Neural Computation.

[15]  M. W. Shields An Introduction to Automata Theory , 1988 .

[16]  Marvin Minsky,et al.  Perceptrons: expanded edition , 1988 .

[17]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[18]  R. Darnell Translation , 1873, The Indian medical gazette.

[19]  J. Shavlik,et al.  The Extraction of Reened Rules from Knowledge-based Neural Networks , 1993 .

[20]  Alberto Tesi,et al.  On the Problem of Local Minima in Backpropagation , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  W S McCulloch,et al.  A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.

[22]  Yann LeCun,et al.  Generalization and network design strategies , 1989 .

[23]  Lokendra Shastri,et al.  Speech recognition using connectionist networks , 1988 .

[24]  Yann LeCun,et al.  Learning processes in an asymmetric threshold network , 1986 .

[25]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[26]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[27]  E. Polak Introduction to linear and nonlinear programming , 1973 .