III.3 – Theory of the Backpropagation Neural Network*

Publisher Summary This chapter presents a survey of the elementary theory of the basic backpropagation neural network architecture, covering the areas of architectural design, performance measurement, function approximation capability, and learning. The survey includes a formulation of the backpropagation neural network architecture to make it a valid neural network and a proof that the backpropagation mean squared error function exists and is differentiable. Also included in the survey is a theorem showing that any L2 function can be implemented to any desired degree of accuracy with a three-layer backpropagation neural network. An appendix presents a speculative neurophysiological model illustrating the way in which the backpropagation neural network architecture might plausibly be implemented in the mammalian brain for corticocortical learning between nearby regions of cerebral cortex. One of the crucial decisions in the design of the backpropagation architecture is the selection of a sigmoidal activation function.

[1]  Francis Crick,et al.  The recent excitement about neural networks , 1989, Nature.

[2]  F. Crick Function of the thalamic reticular complex: the searchlight hypothesis. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[3]  B Kosko,et al.  Adaptive bidirectional associative memories. , 1987, Applied optics.

[4]  H. C. LONGUET-HIGGINS,et al.  Non-Holographic Associative Memory , 1969, Nature.

[5]  Yann LeCun,et al.  Improving the convergence of back-propagation learning with second-order methods , 1989 .

[6]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[7]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[8]  S. Ragazzini,et al.  Learning of word stress in a sub-optimal second order back-propagation neural network , 1988, IEEE 1988 International Conference on Neural Networks.

[9]  P. J. Werbos,et al.  Backpropagation: past and future , 1988, IEEE 1988 International Conference on Neural Networks.

[10]  Halbert White,et al.  Connectionist nonparametric regression: Multilayer feedforward networks can learn arbitrary mappings , 1990, Neural Networks.

[11]  Timur Ash,et al.  Dynamic node creation in backpropagation networks , 1989 .

[12]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[13]  D. R. Hush,et al.  Improving the learning rate of back-propagation with the gradient reuse algorithm , 1988, IEEE 1988 International Conference on Neural Networks.

[14]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[15]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[16]  Bernard Widrow,et al.  Adaptive switching circuits , 1988 .

[17]  Kurt Hornik,et al.  Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks , 1990, Neural Networks.

[18]  Raymond L. Watrous Learning Algorithms for Connectionist Networks: Applied Gradient Methods of Nonlinear Optimization , 1988 .

[19]  Yann LeCun,et al.  A theoretical framework for back-propagation , 1988 .

[20]  Shun-ichi Amari,et al.  A Theory of Adaptive Pattern Classifiers , 1967, IEEE Trans. Electron. Comput..

[21]  BART KOSKO,et al.  Bidirectional associative memories , 1988, IEEE Trans. Syst. Man Cybern..

[22]  Robert Hecht-Nielsen,et al.  A BAM with increased information storage capacity , 1988, IEEE 1988 International Conference on Neural Networks.

[23]  J J Hopfield,et al.  Neurons with graded response have collective computational properties like those of two-state neurons. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[24]  R. Hecht-Nielsen ON THE ALGEBRAIC STRUCTURE OF FEEDFORWARD NETWORK WEIGHT SPACES , 1990 .

[25]  Stephen Grossberg,et al.  Associative Learning, Adaptive Pattern Recognition, And Cooperative-Competitive Decision Making By Neural Networks , 1986, Other Conferences.

[26]  Fernando J. Pineda,et al.  Recurrent Backpropagation and the Dynamical Approach to Adaptive Neural Computation , 1989, Neural Computation.

[27]  Kunihiko Fukushima,et al.  Neocognitron: A hierarchical neural network capable of visual pattern recognition , 1988, Neural Networks.

[28]  Kunihiko Fukushima,et al.  Neocognitron: A new algorithm for pattern recognition tolerant of deformations and shifts in position , 1982, Pattern Recognit..

[29]  J. F. Shepanski Fast learning in artificial neural systems: multilayer perceptron training using optimal estimation , 1988, IEEE 1988 International Conference on Neural Networks.

[30]  Y. L. Cun,et al.  Modèles connexionnistes de l'apprentissage , 1987 .

[31]  H. White,et al.  Universal approximation using feedforward networks with non-sigmoid hidden layer activation functions , 1989, International 1989 Joint Conference on Neural Networks.

[32]  Paul J. Werbos,et al.  Building and Understanding Adaptive Systems: A Statistical/Numerical Approach to Factory Automation and Brain Research , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[33]  Stephen Grossberg,et al.  A massively parallel architecture for a self-organizing neural pattern recognition machine , 1988, Comput. Vis. Graph. Image Process..

[34]  H. White,et al.  There exists a neural network that does not make avoidable mistakes , 1988, IEEE 1988 International Conference on Neural Networks.

[35]  B. Irie,et al.  Capabilities of three-layered perceptrons , 1988, IEEE 1988 International Conference on Neural Networks.

[36]  Hecht-Nielsen Theory of the backpropagation neural network , 1989 .