Constructive learning of recurrent neural networks: limitations of recurrent cascade correlation and a simple solution

It is often difficult to predict the optimal neural network size for a particular application. Constructive or destructive methods that add or subtract neurons, layers, connections, etc. might offer a solution to this problem. We prove that one method, recurrent cascade correlation, due to its topology, has fundamental limitations in representation and thus in its learning capabilities. It cannot represent with monotone (i.e., sigmoid) and hard-threshold activation functions certain finite state automata. We give a "preliminary" approach on how to get around these limitations by devising a simple constructive training method that adds neurons during training while still preserving the powerful fully-recurrent structure. We illustrate this approach by simulations which learn many examples of regular grammars that the recurrent cascade correlation method is unable to learn.

[1]  C. L. Giles,et al.  Inserting rules into recurrent neural networks , 1992, Neural Networks for Signal Processing II Proceedings of the 1992 IEEE Workshop.

[2]  Marvin Minsky,et al.  Computation : finite and infinite machines , 2016 .

[3]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[4]  Michael C. Mozer,et al.  Using Relevance to Reduce Network Size Automatically , 1989 .

[5]  Srimat T. Chakradhar,et al.  First-order versus second-order single-layer recurrent neural networks , 1994, IEEE Trans. Neural Networks.

[6]  Marcus Frean,et al.  The Upstart Algorithm: A Method for Constructing and Training Feedforward Neural Networks , 1990, Neural Computation.

[7]  S C Kleene,et al.  Representation of Events in Nerve Nets and Finite Automata , 1951 .

[8]  E. A. Jackson,et al.  Perspectives of nonlinear dynamics , 1990 .

[9]  C. Lee Giles,et al.  Extraction, Insertion and Refinement of Symbolic Rules in Dynamically Driven Recurrent Neural Networks , 1993 .

[10]  Raymond L. Watrous,et al.  Induction of Finite-State Languages Using Second-Order Recurrent Networks , 1992, Neural Computation.

[11]  C. Lee Giles,et al.  Extracting and Learning an Unknown Grammar with Recurrent Neural Networks , 1991, NIPS.

[12]  Scott E. Fahlman,et al.  The Recurrent Cascade-Correlation Architecture , 1990, NIPS.

[13]  J. Nadal,et al.  Learning in feedforward layered networks: the tiling algorithm , 1989 .

[14]  C. Lee Giles,et al.  Learning and Extracting Finite State Automata with Second-Order Recurrent Neural Networks , 1992, Neural Computation.

[15]  M. W. Shields An Introduction to Automata Theory , 1988 .

[16]  M. Goudreau,et al.  First-order vs. Second-order Single Layer Recurrent Neural Networks , 1994 .

[17]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[18]  Hava T. Siegelmann,et al.  The complexity of language recognition by neural networks , 1992, Neurocomputing.

[19]  M. Golea,et al.  A Convergence Theorem for Sequential Learning in Two-Layer Perceptrons , 1990 .

[20]  Joachim Diederich,et al.  Connectionist Recruitment Learning , 1988, ECAI.

[21]  Michael C. Mozer,et al.  Skeletonization: A Technique for Trimming the Fat from a Network via Relevance Assessment , 1988, NIPS.

[22]  Yamashita,et al.  Backpropagation algorithm which varies the number of hidden units , 1989 .

[23]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[24]  James L. McClelland,et al.  Finite State Automata and Simple Recurrent Networks , 1989, Neural Computation.

[25]  S. A. Solla Capacity control in classifiers for pattern recognition , 1992, Neural Networks for Signal Processing II Proceedings of the 1992 IEEE Workshop.

[26]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[27]  D.R. Hush,et al.  Progress in supervised neural networks , 1993, IEEE Signal Processing Magazine.

[28]  C. Lee Giles,et al.  Experimental Comparison of the Effect of Order in Recurrent Neural Networks , 1993, Int. J. Pattern Recognit. Artif. Intell..

[29]  Ronald J. Williams,et al.  Experimental Analysis of the Real-time Recurrent Learning Algorithm , 1989 .

[30]  Pierre Roussel-Ragot,et al.  Neural Networks and Nonlinear Adaptive Filtering: Unifying Concepts and New Algorithms , 1993, Neural Computation.

[31]  Padhraic Smyth,et al.  Learning Finite State Machines With Self-Clustering Recurrent Networks , 1993, Neural Computation.

[32]  Stephen Jose Hanson,et al.  Meiosis Networks , 1989, NIPS.

[33]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[34]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[35]  T. Ash,et al.  Dynamic node creation in backpropagation networks , 1989, International 1989 Joint Conference on Neural Networks.

[36]  Leon O. Chua,et al.  Practical Numerical Algorithms for Chaotic Systems , 1989 .

[37]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[38]  Eduardo Sontag,et al.  Turing computability with neural nets , 1991 .

[39]  Noga Alon,et al.  Efficient simulation of finite automata by neural nets , 1991, JACM.

[40]  Tariq Samad,et al.  Designing Application-Specific Neural Networks Using the Genetic Algorithm , 1989, NIPS.