30 years of adaptive neural networks: perceptron, Madaline, and backpropagation

Fundamental developments in feedforward artificial neural networks from the past thirty years are reviewed. The history, origination, operating characteristics, and basic theory of several supervised neural-network training algorithms (including the perceptron rule, the least-mean-square algorithm, three Madaline rules, and the backpropagation technique) are described. The concept underlying these iterative adaptation algorithms is the minimal disturbance principle, which suggests that during training it is advisable to inject new information into a network in a manner that disturbs stored information to the smallest extent possible. The two principal kinds of online rules that have developed for altering the weights of a network are examined for both single-threshold elements and multielement networks. They are error-correction rules, which alter the weights of a network to correct error in the output response to the present input pattern, and gradient rules, which alter the weights of a network during each pattern presentation by gradient descent with the objective of reducing mean-square error (averaged over all training patterns). >

[1]  G. TEMPLE,et al.  Relaxation Methods in Engineering Science , 1942, Nature.

[2]  Norbert Wiener,et al.  Extrapolation, Interpolation, and Smoothing of Stationary Time Series, with Engineering Applications , 1949 .

[3]  H. W. Bode,et al.  A Simplified Derivation of Linear Least Square Smoothing and Prediction Theory , 1950, Proceedings of the IRE.

[4]  D. Whitteridge,et al.  Learning and Relearning , 1959, Science's STKE.

[5]  Louise Hay,et al.  THE NUMBER OF ORTHANTS IN N-SPACE INTERSECTED BY AN S-DIMENSIONAL SUBSPACE , 1960 .

[6]  Lawrence W. Stark,et al.  Computer pattern recognition techniques: electrocardiographic diagnosis , 1962, CACM.

[7]  B. Widrow,et al.  Generalization and information storage in network of adaline 'neurons' , 1962 .

[8]  H. D. Block The perceptron: a model for brain functioning. I , 1962 .

[9]  J. S. Koford,et al.  Real‐Time Adaptive Speech‐Recognition System , 1963 .

[10]  Karl Steinbuch,et al.  Learning Matrices and Their Applications , 1963, IEEE Trans. Electron. Comput..

[11]  Frank Rosenblatt,et al.  PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .

[12]  R. E. Kalman,et al.  Optimum Seeking Methods. , 1964 .

[13]  Norbert Wiener,et al.  Extrapolation, Interpolation, and Smoothing of Stationary Time Series , 1964 .

[14]  F. K. Becker,et al.  Automatic equalization for digital communication , 1965 .

[15]  F. K. Becker,et al.  Automatic equalization for digital communication , 1965 .

[16]  Filson Henry Glanz,et al.  Statistical extrapolation in certain adaptive pattern-recognition systems , 1965 .

[17]  D F Specht,et al.  Vectorcardiographic diagnosis using the polynomial discriminant method of pattern recognition. , 1967, IEEE transactions on bio-medical engineering.

[18]  Donald F. Specht,et al.  Generation of Polynomial Discriminant Functions for Pattern Recognition , 1967, IEEE Trans. Electron. Comput..

[19]  and C.L. Coates Lewis,et al.  Threshold Logic , 1967 .

[20]  M. M. Sondhi,et al.  An adaptive echo canceller , 1967 .

[21]  B. Widrow,et al.  Adaptive antenna systems , 1967 .

[22]  R W Lucky,et al.  Principles of data communication , 1968 .

[23]  A. G. Ivakhnenko,et al.  Polynomial Theory of Complex Systems , 1971, IEEE Trans. Syst. Man Cybern..

[24]  Bernard Widrow,et al.  Punish/Reward: Learning with a Critic in Adaptive Threshold Systems , 1973, IEEE Trans. Syst. Man Cybern..

[25]  Thomas Kailath,et al.  A view of three decades of linear filtering theory , 1974, IEEE Trans. Inf. Theory.

[26]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[27]  B. Widrow,et al.  Adaptive noise cancelling: Principles and applications , 1975 .

[28]  James S. Albus,et al.  New Approach to Manipulator Control: The Cerebellar Model Articulation Controller (CMAC)1 , 1975 .

[29]  D. Casasent,et al.  Position, rotation, and scale invariant optical correlation. , 1976, Applied optics.

[30]  K. Senne,et al.  Performance advantage of complex LMS for controlling narrow-band adaptive arrays , 1981 .

[31]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[33]  J J Hopfield,et al.  Neurons with graded response have collective computational properties like those of two-state neurons. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[34]  David G. Luenberger,et al.  Linear and nonlinear programming , 1984 .

[35]  Bernard Widrow,et al.  The least mean fourth (LMF) adaptive algorithm and its family , 1984, IEEE Trans. Inf. Theory.

[36]  Bernard Widrow,et al.  Adaptive Signal Processing , 1985 .

[37]  Yaser S. Abu-Mostafa,et al.  Information capacity of the Hopfield model , 1985, IEEE Trans. Inf. Theory.

[38]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[39]  Geoffrey E. Hinton,et al.  Learning and relearning in Boltzmann machines , 1986 .

[40]  Richard P. Lippmann,et al.  An introduction to computing with neural nets , 1987 .

[41]  Colin Giles,et al.  Learning, invariance, and generalization in high-order neural networks. , 1987, Applied optics.

[42]  Pineda,et al.  Generalization of back-propagation to recurrent neural networks. , 1987, Physical review letters.

[43]  C. Lee Giles,et al.  Encoding Geometric Invariances in Higher-Order Neural Networks , 1987, NIPS.

[44]  Eric B. Baum,et al.  Supervised Learning of Probability Distributions by Neural Networks , 1987, NIPS.

[45]  W. Thomas Miller,et al.  Sensor-based control of robotic manipulators using a general learning algorithm , 1987, IEEE J. Robotics Autom..

[46]  B Kosko,et al.  Adaptive bidirectional associative memories. , 1987, Applied optics.

[47]  Stephen Grossberg,et al.  A massively parallel architecture for a self-organizing neural pattern recognition machine , 1988, Comput. Vis. Graph. Image Process..

[48]  S. Venkatesh Epsilon capacity of neural networks , 1987 .

[49]  Bernard Widrow,et al.  Adaptive inverse control , 1987, Proceedings of 8th IEEE International Symposium on Intelligent Control.

[50]  Terrence J. Sejnowski,et al.  Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[51]  Y S Abu-Mostafa,et al.  Neural networks for computing , 1987 .

[52]  Charles M. Newman,et al.  Memory capacity in neural network models: Rigorous lower bounds , 1988, Neural Networks.

[53]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[54]  Richard Fozzard,et al.  A Connectionist Expert System that Actually Works , 1988, NIPS.

[55]  Eric B. Baum,et al.  On the capabilities of multilayer perceptrons , 1988, J. Complex..

[56]  Esther Levin,et al.  Accelerated Learning in Layered Neural Networks , 1988, Complex Syst..

[57]  Bernard Widrow,et al.  Adaptive switching circuits , 1988 .

[58]  Terrence J. Sejnowski,et al.  NETtalk: a parallel network that learns to read aloud , 1988 .

[59]  Alireza Khotanzad,et al.  Rotation invariant pattern recognition using Zernike moments , 1988, [1988 Proceedings] 9th International Conference on Pattern Recognition.

[60]  D. G. Bounds,et al.  A multilayer perceptron network for the diagnosis of low back pain , 1988, IEEE 1988 International Conference on Neural Networks.

[61]  J. Shynk,et al.  The LMS algorithm with momentum updating , 1988, 1988., IEEE International Symposium on Circuits and Systems.

[62]  Christoph von der Malsburg,et al.  Pattern recognition by labeled graph matching , 1988, Neural Networks.

[63]  Yann LeCun,et al.  A theoretical framework for back-propagation , 1988 .

[64]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[65]  Alberto L. Sangiovanni-Vincentelli,et al.  Efficient Parallel Learning Algorithms for Neural Networks , 1988, NIPS.

[66]  B. Irie,et al.  Capabilities of three-layered perceptrons , 1988, IEEE 1988 International Conference on Neural Networks.

[67]  Bernard Widrow,et al.  Neural nets for adaptive filtering and adaptive pattern recognition , 1988, Computer.

[68]  Eduardo D. Sontag,et al.  Backpropagation Can Give Rise to Spurious Local Minima Even for Networks without Hidden Layers , 1989, Complex Syst..

[69]  Yoh-Han Pao,et al.  Functional link nets: removing hidden layers , 1989 .

[70]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[71]  Barak A. Pearlmutter Learning State Space Trajectories in Recurrent Neural Networks , 1989, Neural Computation.

[72]  B. Widrow,et al.  The truck backer-upper: an example of self-learning in neural networks , 1989, International 1989 Joint Conference on Neural Networks.

[73]  A. Owens,et al.  Efficient training of the backpropagation network by solving a system of stiff ordinary differential equations , 1989, International 1989 Joint Conference on Neural Networks.

[74]  P. M. Shea,et al.  Detection of explosives in checked airline baggage using an artificial neural system , 1989, International 1989 Joint Conference on Neural Networks.

[75]  Sontag,et al.  Backpropagation separates when perceptrons do , 1989 .

[76]  S. Tam,et al.  An electrically trainable artificial neural network (ETANN) with 10240 'floating gate' synapses , 1990, International 1989 Joint Conference on Neural Networks.

[77]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[78]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[79]  H. White,et al.  Universal approximation using feedforward networks with non-sigmoid hidden layer activation functions , 1989, International 1989 Joint Conference on Neural Networks.

[80]  Kumpati S. Narendra,et al.  Identification and control of dynamical systems using neural networks , 1990, IEEE Trans. Neural Networks.

[81]  L. B. Almeida A learning rule for asynchronous perceptrons with feedback in a combinatorial environment , 1990 .

[82]  M. W. Roth Survey of neural network technology for automatic target recognition , 1990, IEEE Trans. Neural Networks.

[83]  Stephen Grossberg,et al.  ART 3: Hierarchical search using chemical transmitters in self-organizing pattern recognition architectures , 1990, Neural Networks.

[84]  Bernard Widrow,et al.  Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[85]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1992, Math. Control. Signals Syst..