Progress in supervised neural networks

Theoretical results concerning the capabilities and limitations of various neural network models are summarized, and some of their extensions are discussed. The network models considered are divided into two basic categories: static networks and dynamic networks. Unlike static networks, dynamic networks have memory. They fall into three groups: networks with feedforward dynamics, networks with output feedback, and networks with state feedback, which are emphasized in this work. Most of the networks discussed are trained using supervised learning.<<ETX>>

[1]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[2]  D. J. Farlie,et al.  Prediction and Regulation by Linear Least-Square Methods , 1964 .

[3]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[4]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[5]  A. G. Ivakhnenko,et al.  Polynomial Theory of Complex Systems , 1971, IEEE Trans. Syst. Man Cybern..

[6]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[7]  Julius T. Tou,et al.  Pattern Recognition Principles , 1974 .

[8]  John W. Tukey,et al.  A Projection Pursuit Algorithm for Exploratory Data Analysis , 1974, IEEE Transactions on Computers.

[9]  Taylor L. Booth,et al.  Grammatical Inference: Introduction and Survey - Part I , 1975, IEEE Trans. Syst. Man Cybern..

[10]  Taylor L. Booth,et al.  Grammatical Inference: Introduction and Survey-Part I , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[12]  E. Mark Gold,et al.  Complexity of Automaton Identification from Given Data , 1978, Inf. Control..

[13]  DANA ANGLUIN,et al.  On the Complexity of Minimum Inference of Regular Sets , 1978, Inf. Control..

[14]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[15]  J. Albus Mechanisms of planning and problem solving in the brain , 1979 .

[16]  J. Friedman,et al.  Projection Pursuit Regression , 1981 .

[17]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[18]  B. Huberman,et al.  Dynamic behavior of nonlinear networks , 1983 .

[19]  J J Hopfield,et al.  Neurons with graded response have collective computational properties like those of two-state neurons. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Yaser S. Abu-Mostafa,et al.  Information capacity of the Hopfield model , 1985, IEEE Trans. Inf. Theory.

[21]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[22]  Geoffrey E. Hinton,et al.  Experiments on Learning by Back Propagation. , 1986 .

[23]  J. Rissanen Stochastic Complexity and Modeling , 1986 .

[24]  Pavel Pudlák,et al.  Threshold circuits of bounded depth , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[25]  Richard P. Lippmann,et al.  An introduction to computing with neural nets , 1987 .

[26]  Lawrence D. Jackel,et al.  Large Automatic Learning, Rule Extraction, and Generalization , 1987, Complex Syst..

[27]  Anthony J. Robinson,et al.  Static and Dynamic Error Propagation Networks with Application to Speech Coding , 1987, NIPS.

[28]  R. Lippmann,et al.  An introduction to computing with neural nets , 1987, IEEE ASSP Magazine.

[29]  Santosh S. Venkatesh,et al.  The capacity of the Hopfield associative memory , 1987, IEEE Trans. Inf. Theory.

[30]  Joseph W. Goodman,et al.  A generalized convergence theorem for neural networks , 1988, IEEE Trans. Inf. Theory.

[31]  Bernard Widrow,et al.  Adaptive switching circuits , 1988 .

[32]  Terrence J. Sejnowski,et al.  NETtalk: a parallel network that learns to read aloud , 1988 .

[33]  Fernando J. Pineda,et al.  Dynamics and architecture for neural computation , 1988, J. Complex..

[34]  Teuvo Kohonen,et al.  An introduction to neural computing , 1988, Neural Networks.

[35]  D. R. Hush,et al.  Improving the learning rate of back-propagation with the gradient reuse algorithm , 1988, IEEE 1988 International Conference on Neural Networks.

[36]  Raymond L. Watrous Learning Algorithms for Connectionist Networks: Applied Gradient Methods of Nonlinear Optimization , 1988 .

[37]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[38]  Jordan B. Pollack,et al.  Implications of Recursive Distributed Representations , 1988, NIPS.

[39]  J. Makhoul,et al.  Formation of disconnected decision regions with a single hidden layer , 1989, International 1989 Joint Conference on Neural Networks.

[40]  Demetri Psaltis,et al.  Linear and logarithmic capacities in associative neural networks , 1989, IEEE Trans. Inf. Theory.

[41]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[42]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[43]  Barak A. Pearlmutter Learning State Space Trajectories in Recurrent Neural Networks , 1989, Neural Computation.

[44]  Anthony Kuh,et al.  Information capacity of associative memories , 1989, IEEE Trans. Inf. Theory.

[45]  Hervé Bourlard,et al.  A Continuous Speech Recognition System Embedding MLP into HMM , 1989, NIPS.

[46]  Yann LeCun,et al.  Improving the convergence of back-propagation learning with second-order methods , 1989 .

[47]  Halbert White,et al.  Learning in Artificial Neural Networks: A Statistical Perspective , 1989, Neural Computation.

[48]  A. Barron,et al.  Statistical properties of artificial neural networks , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[49]  John S. Bridle,et al.  Training Stochastic Model Recognition Algorithms as Networks can Lead to Maximum Mutual Information Estimation of Parameters , 1989, NIPS.

[50]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[51]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[52]  Eduardo D. Sontag Sigmoids Distinguish More Efficiently Than Heavisides , 1989, Neural Computation.

[53]  Philip D. Wasserman,et al.  Neural computing - theory and practice , 1989 .

[54]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[55]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[56]  D. R. Hush,et al.  Classification with neural networks: a performance analysis , 1989, IEEE 1989 International Conference on Systems Engineering.

[57]  Richard P. Lippmann,et al.  A Comparative Study of the Practical Characteristics of Neural Network and Conventional Pattern Classifiers , 1990, NIPS 1990.

[58]  Anthony N. Michel,et al.  A synthesis procedure for Hopfield's continuous-time associative memory , 1990 .

[59]  Donald F. Specht,et al.  Probabilistic neural networks , 1990, Neural Networks.

[60]  James D. Keeler,et al.  Layered Neural Networks with Gaussian Hidden Units as Universal Approximations , 1990, Neural Computation.

[61]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[62]  H. Gish,et al.  A probabilistic approach to the understanding and training of neural network classifiers , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[63]  Amro El-Jaroudi,et al.  A new error criterion for posterior probability estimation with neural nets , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[64]  Y. Sandler Model of neural networks with selective memorization and chaotic behavior , 1990 .

[65]  Ronald J. Williams,et al.  Gradient-based learning algorithms for recurrent connectionist networks , 1990 .

[66]  Yong Yao,et al.  Model of biological pattern recognition with spatially chaotic dynamics , 1990, Neural Networks.

[67]  K. Aihara,et al.  Chaotic neural networks , 1990 .

[68]  Geoffrey E. Hinton,et al.  A time-delay neural network architecture for isolated word recognition , 1990, Neural Networks.

[69]  H. Bourlard,et al.  Links Between Markov Models and Multilayer Perceptrons , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[70]  Geoffrey E. Hinton,et al.  Adaptive Soft Weight Tying using Gaussian Mixtures , 1991, NIPS.

[71]  Noga Alon,et al.  Efficient simulation of finite automata by neural nets , 1991, JACM.

[72]  Shang-Liang Chen,et al.  Orthogonal least squares learning algorithm for radial basis function networks , 1991, IEEE Trans. Neural Networks.

[73]  Thomas Kailath,et al.  Depth-Size Tradeoffs for Neural Computation , 1991, IEEE Trans. Computers.

[74]  David Haussler,et al.  Estimating Average-Case Learning Curves Using Bayesian, Statistical Physics and VC Dimension Methods , 1991, NIPS.

[75]  Chaouki T. Abdallah,et al.  Recursive neural networks for signal processing and control , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.

[76]  D. R. Hush,et al.  Error surfaces for multi-layer perceptrons , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[77]  R. P. Lippmann A critical overview of neural network pattern classifiers , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.

[78]  Joydeep Ghosh,et al.  Efficient training procedures for adaptive kernel classifiers , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.

[79]  John Moody,et al.  Note on generalization, regularization and architecture selection in nonlinear learning systems , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.

[80]  Thomas G. Dietterich,et al.  Improving the Performance of Radial Basis Function Networks by Learning Center Locations , 1991, NIPS.

[81]  Richard Lippmann,et al.  Neural Network Classifiers Estimate Bayesian a posteriori Probabilities , 1991, Neural Computation.

[82]  Y. C. Lee,et al.  Turing equivalence of neural networks with second order connection weights , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[83]  David E. Rumelhart,et al.  BACK-PROPAGATION, WEIGHT-ELIMINATION AND TIME SERIES PREDICTION , 1991 .

[84]  Kumpati S. Narendra,et al.  Gradient methods for the optimization of dynamical systems containing neural networks , 1991, IEEE Trans. Neural Networks.

[85]  Sukhan Lee,et al.  A Gaussian potential function network with hierarchically self-organizing learning , 1991, Neural Networks.

[86]  Robert P. W. Duin,et al.  Generalization capabilities of minimal kernel-based networks , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[87]  Yih-Fang Huang,et al.  Bounds on the number of hidden neurons in multilayer perceptrons , 1991, IEEE Trans. Neural Networks.

[88]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1992, Math. Control. Signals Syst..

[89]  Ronald L. Rivest,et al.  Training a 3-node neural network is NP-complete , 1988, COLT '88.

[90]  Mohamad T. Musavi,et al.  On the training of radial basis function classifiers , 1992, Neural Networks.

[91]  Don R. Hush,et al.  Error surfaces for multilayer perceptrons , 1992, IEEE Trans. Syst. Man Cybern..

[92]  C. Lee Giles,et al.  Learning and Extracting Finite State Automata with Second-Order Recurrent Neural Networks , 1992, Neural Computation.

[93]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[94]  Raymond L. Watrous,et al.  Induction of Finite-State Languages Using Second-Order Recurrent Networks , 1992, Neural Computation.

[95]  Jacek M. Zurada,et al.  Introduction to artificial neural systems , 1992 .

[96]  Geoffrey E. Hinton,et al.  Simplifying Neural Networks by Soft Weight-Sharing , 1992, Neural Computation.