Modularization by Cascading Neural Networks

The application of artificial neural networks to complex real world problems usually requires a modularization of the network architecture. The single modules deal with subtasks that are defined by a decomposition of the problem. Up to now, the modularization of the network is usually done heuristically. Little is known about sensible methods to adapt the network structure to the problem at hand. Incrementally constructed cascade architectures are a promising approach to grow networks according to the needs of the problem. This paper discusses the properties of the recently proposed direct cascade architecture DCA (Littmann & Ritter 1992). One important virtue of DCA is that it allows the cascading of entire subnetworks, even if these admit no error-backpropagation. Exploiting this flexibility and using LLM networks as cascaded elements, we show that the performance of the resulting network cascades can be greatly enhanced compared to the performance of a single network. Our results for the Mackey-Glass time series prediction task indicate that such deeply cascaded network architectures achieve good generalization even on small data sets, when shallow, broad architectures of comparable size suffer from overfitting. We conclude that the DCA approach offers a powerful and flexible alternative to existing schemes such as, e. g. , the mixtures of experts approach, for the construction of modular systems from a wide range of subnetwork types.

[1]  Scott E. Fahlman,et al.  The Recurrent Cascade-Correlation Architecture , 1990, NIPS.

[2]  J. Nadal,et al.  Learning in feedforward layered networks: the tiling algorithm , 1989 .

[3]  John A. Hertz,et al.  Exploiting Neurons with Localized Receptive Fields to Learn Chaos , 1990, Complex Syst..

[4]  Helge Ritter,et al.  Learning with the Self-Organizing Map , 1991 .

[5]  Geoffrey E. Hinton,et al.  Evaluation of Adaptive Mixtures of Competing Experts , 1990, NIPS.

[6]  Marcus Frean,et al.  The Upstart Algorithm: A Method for Constructing and Training Feedforward Neural Networks , 1990, Neural Computation.

[7]  L. Glass,et al.  Oscillation and chaos in physiological control systems. , 1977, Science.

[8]  Helge J. Ritter,et al.  Learning and Generalization in Cascade Network Architectures , 1996, Neural Computation.

[9]  Helge Ritter,et al.  Learning 3D-shape perception with local linear maps , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[10]  Albert Y. Zomaya,et al.  Toward generating neural network structures for function approximation , 1994, Neural Networks.

[11]  Michael C. Mozer,et al.  A Focused Backpropagation Algorithm for Temporal Pattern Recognition , 1989, Complex Syst..

[12]  Helge Ritter,et al.  Cascade network architectures , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[13]  Helge Ritter,et al.  Analysis and Applications of the Direct Cascade Architecture , 1994 .

[14]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[15]  John M. Zelle,et al.  Growing layers of perceptrons: introducing the Extentron algorithm , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[16]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[17]  Michael I. Jordan,et al.  A Competitive Modular Connectionist Architecture , 1990, NIPS.

[18]  D. E. Rumelhart,et al.  Learning internal representations by back-propagating errors , 1986 .

[19]  David S. Touretzky,et al.  Connectionist models : proceedings of the 1990 summer school , 1991 .

[20]  Helge J. Ritter,et al.  Generalization Abilities of Cascade Network Architecture , 1992, NIPS.

[21]  James D. Keeler,et al.  Predicting the Future: Advantages of Semilocal Units , 1991, Neural Computation.

[22]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[23]  Helge J. Ritter,et al.  Neural computation and self-organizing maps - an introduction , 1992, Computation and neural systems series.

[24]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.