Nonmonotonic activation functions in multilayer perceptrons

Multilayer perceptrons (MLPs) and radial basis function networks (RBFNs) are the two most common types of feedforward neural networks used for pattern classification and continuous function approximation. MLPs are characterized by slow learning speed, low memory retention, and small node requirements, while RBFNs are known to have high learning speed, high memory retention, but large node requirements. This dissertation asks and answers the question: "Can we do better?" Two types of neural network architectures are introduced: the hyper-ridge and the hyper-hill. A hyper-ridge network is a perceptron with no hidden layers and an activation function in the form g(h) = sgn($c\sp{2} - h\sp2) (h$ is the net input; c is a constant "width"), while a hyper-hill network is a continuos multilayer version with g(h) = exp($-h\sp2/c\sp2)$. Throughout this dissertation theoretical and empirical evidence is presented which strongly indicates that hyper-hills behave similarly to MLPs when the input dimension is large but are more similar to RBFNs when the input dimension is low. Additionally, the fact that hyper-hills learn faster than MLPs and require fewer nodes, but do not suffer the "curse of dimensionality" associated with RBFNs, is offered as evidence that hyper-hills fill a niche between MLPs and RBFNs.

[1]  Geoffrey E. Hinton,et al.  Experiments on Learning by Back Propagation. , 1986 .

[2]  John A. Hertz,et al.  Exploiting Neurons with Localized Receptive Fields to Learn Chaos , 1990, Complex Syst..

[3]  J. Miller Numerical Analysis , 1966, Nature.

[4]  L. Glass,et al.  Oscillation and chaos in physiological control systems. , 1977, Science.

[5]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[6]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[7]  Erkki Oja,et al.  Neural Networks, Principal Components, and Subspaces , 1989, Int. J. Neural Syst..

[8]  Christopher G. Atkeson,et al.  Generalization Properties of Radial Basis Functions , 1990, NIPS.

[9]  T Poggio,et al.  Regularization Algorithms for Learning That Are Equivalent to Multilayer Networks , 1990, Science.

[10]  Kurt Hornik,et al.  Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks , 1990, Neural Networks.

[11]  G. Kane Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 1: Foundations, vol 2: Psychological and Biological Models , 1994 .

[12]  Nils J. Nilsson,et al.  Learning Machines: Foundations of Trainable Pattern-Classifying Systems , 1965 .

[13]  Thomas M. Cover,et al.  Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..

[14]  Patrice Y. Simard,et al.  Analysis of Recurrent Backpropagation , 1988 .

[15]  P. S. Lewis,et al.  Function approximation and time series prediction with neural networks , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[16]  James A. Anderson,et al.  Neurocomputing: Foundations of Research , 1988 .

[17]  J. Doyne Farmer,et al.  Exploiting Chaos to Predict the Future and Reduce Noise , 1989 .

[18]  James D. Keeler,et al.  Layered Neural Networks with Gaussian Hidden Units as Universal Approximations , 1990, Neural Computation.

[19]  L. S. Davis,et al.  The Use Of A Radial Basis Function Network For Visual Autonomous Road Following , 1993, Proceedings of the Intelligent Vehicles '93 Symposium.

[20]  Farmer,et al.  Predicting chaotic time series. , 1987, Physical review letters.

[21]  A. A. Mullin,et al.  Principles of neurodynamics , 1962 .

[22]  Robert M. Farber,et al.  How Neural Nets Work , 1987, NIPS.

[23]  A. Lapedes,et al.  Nonlinear signal processing using neural networks: Prediction and system modelling , 1987 .

[24]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[25]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[26]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[27]  N. S. Barnett,et al.  Private communication , 1969 .

[28]  Martin Casdagli,et al.  Nonlinear prediction of chaotic time series , 1989 .

[29]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[30]  Shlomo Geva,et al.  A constructive method for multivariate function approximation by multilayer perceptrons , 1992, IEEE Trans. Neural Networks.

[31]  Michael R. W. Dawson,et al.  Modifying the Generalized Delta Rule to Train Networks of Non-monotonic Processors for Pattern Classification , 1992 .

[32]  E. Gardner,et al.  Maximum Storage Capacity in Neural Networks , 1987 .

[33]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[34]  Sukhan Lee,et al.  Multilayer feedforward potential function network , 1988, IEEE 1988 International Conference on Neural Networks.

[35]  E. Oja,et al.  On stochastic approximation of the eigenvectors and eigenvalues of the expectation of a random matrix , 1985 .

[36]  Freerk A. Lootsma,et al.  State-of-the-art in parallel nonlinear optimization , 1988, Parallel Comput..

[37]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[38]  S. Qian,et al.  Nonlinear adaptive networks: A little theory, a few applications , 1990 .

[39]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[40]  John E. Moody,et al.  Fast Learning in Multi-Resolution Hierarchies , 1988, NIPS.

[41]  Donald F. Specht,et al.  Probabilistic neural networks , 1990, Neural Networks.

[42]  H. White,et al.  Universal approximation using feedforward networks with non-sigmoid hidden layer activation functions , 1989, International 1989 Joint Conference on Neural Networks.

[43]  C. L. Giles,et al.  Machine learning using higher order correlation networks , 1986 .

[44]  D A Pierre,et al.  Optimization Theory with Applications , 1986 .

[45]  Mahesan Niranjan,et al.  Neural networks and radial basis functions in classifying static speech patterns , 1990 .

[46]  Arthur E. Bryson,et al.  Applied Optimal Control , 1969 .

[47]  William H. Press,et al.  Numerical recipes , 1990 .