DESIGN OF NEURAL NETWORK FILTERS

The subject of this Ph.D. Thesis is design of neural network filters. Neural network filters may be viewed as an extension of classical linear adaptive filters to deal with nonlinear modeling tasks. We focus on neural network architectures for implementation of the non-recursive, nonlinear adaptive model with additive error. The objective is to clarify a number of phases involved in the design of neural network filter architectures in connection with “black box” modeling tasks such as system identification, inverse modeling and timeseries prediction. The major contributions comprises: • The development of an architecture taxonomy based on formulating a canonical filter representation. The substantial part of the taxonomy is the distinction between global and local models. The taxonomy leads to the classification of a number of existing neural network architectures and, in addition, suggests the potential development of novel structures. Various architectures are reviewed and interpreted. Especially we attach importance to interpretations of the multi-layer perceptron neural network. • Formulation of a generic nonlinear filter architecture which consists of a combination of the canonical filter and a preprocessing unit. The architecture may be viewed as a heterogeneous three-layer neural network. A number of preprocessing methods are suggested with reference to bypassing the “curse of dimensionality” without reducing the performance significantly. • Discussion of various algorithms for estimating characteristic model weights (parameters). We suggest efficient implementations of standard first and second order optimization algorithms for layered architectures. In addition, in order to speedup convergence a weight initialization algorithm for the 2-layer perceptron neural networks is developed. • Clarification and discussion of fundamental limitations in the search for optimal network architectures based upon a decomposition of the average generalization error, called the model error decomposition. This includes a discussion of employing regularization. • The development and discussion of a novel generalization error estimator, GEN , which is valid for incomplete, nonlinear models. The ability to deal with incomplete models is particularly important when performing “black box” modeling. The models are assumed to be estimated by minimizing the least squares cost function with a regularization term. The estimator is based on a statistical framework and may be viewed as an extension of Akaike’s classical FPE -estimator and Moody’s GPE -estimator.

[1]  John A. Hertz,et al.  Exploiting Neurons with Localized Receptive Fields to Learn Chaos , 1990, Complex Syst..

[2]  Shun-ichi Amari,et al.  Network information criterion-determining the number of hidden units for an artificial neural network model , 1994, IEEE Trans. Neural Networks.

[3]  Abraham H Haddad,et al.  Nonlinear Systems: Processing of Random Signals - Classical Analysis , 1975 .

[4]  H. Tong,et al.  Threshold Autoregression, Limit Cycles and Cyclical Data , 1980 .

[5]  M. Rosenblatt Stationary sequences and random fields , 1985 .

[6]  John Aasted Sørensen A family of quantization based piecewise linear filter networks , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  H. White,et al.  Economic prediction using neural networks: the case of IBM daily stock returns , 1988, IEEE 1988 International Conference on Neural Networks.

[8]  Esther Levin,et al.  Neural network architecture for adaptive system modeling and control , 1991, International 1989 Joint Conference on Neural Networks.

[9]  D. Falconer Adaptive equalization of channel nonlinearities in QAM data transmission systems , 1978, The Bell System Technical Journal.

[10]  Sophocles J. Orfanidis,et al.  GramSchmidt Neural Nets , 1990, Neural Computation.

[11]  S. Haykin,et al.  Adaptive Filter Theory , 1986 .

[12]  Jan Larsen,et al.  A neural architecture for nonlinear adaptive filtering of time series , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.

[13]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[14]  B. Townshend,et al.  Nonlinear prediction of speech , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[15]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[16]  W. Pitts,et al.  A Logical Calculus of the Ideas Immanent in Nervous Activity (1943) , 2021, Ideas That Created the Future.

[17]  Bernard Widrow,et al.  Adaptive Signal Processing , 1985 .

[18]  John C. Platt A Resource-Allocating Network for Function Interpolation , 1991, Neural Computation.

[19]  Stefanos Kollias,et al.  An adaptive least squares algorithm for the efficient training of artificial neural networks , 1989 .

[20]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[21]  L. Chua,et al.  A global representation of multidimensional piecewise-linear functions with linear partitions , 1978 .

[22]  L. Glass,et al.  Oscillation and chaos in physiological control systems. , 1977, Science.

[23]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[24]  Nils Hoffmann A neural feedforward network with a polynomial nonlinearity , 1992, Neural Networks for Signal Processing II Proceedings of the 1992 IEEE Workshop.

[25]  Edward J. Powers,et al.  A digital method of modeling quadratically nonlinear systems with a general random input , 1988, IEEE Trans. Acoust. Speech Signal Process..

[26]  H. Akaike Fitting autoregressive models for prediction , 1969 .

[27]  H. White Consequences and Detection of Misspecified Nonlinear Regression Models , 1981 .

[28]  Jan Larsen,et al.  A generalization error estimate for nonlinear systems , 1992, Neural Networks for Signal Processing II Proceedings of the 1992 IEEE Workshop.

[29]  José Carlos Príncipe,et al.  Modeling Applications with the Focused Gamma Net , 1991, NIPS.

[30]  Neil E. Cotter,et al.  The Stone-Weierstrass theorem and its application to neural networks , 1990, IEEE Trans. Neural Networks.

[31]  Sheng Chen,et al.  Adaptive Equalisation to finite Non-linear Channels using Multilayer Perceptrons , 1990 .

[32]  B. Widrow,et al.  The truck backer-upper: an example of self-learning in neural networks , 1989, International 1989 Joint Conference on Neural Networks.

[33]  S. Parker,et al.  A discrete ARMA model for nonlinear system identification , 1981 .

[34]  Calyampudi Radhakrishna Rao,et al.  Linear Statistical Inference and its Applications , 1967 .

[35]  Vladimir Cherkassky,et al.  Neural networks and nonparametric regression , 1992, Neural Networks for Signal Processing II Proceedings of the 1992 IEEE Workshop.

[36]  Chris Bishop,et al.  Improving the Generalization Properties of Radial Basis Function Neural Networks , 1991, Neural Computation.

[37]  G. G. Tango NONLINEAR SYSTEMS ANALYSIS AND IDENTIFICATION FROM RANDOM DATA J. S. Bendat John Wiley & Sons New York, Chichester, Brisbane, Toronto, Singapore 1990, 267 pp, $49.95 , 1991 .

[38]  John E. Moody,et al.  The Effective Number of Parameters: An Analysis of Generalization and Regularization in Nonlinear Learning Systems , 1991, NIPS.

[39]  C.F.N. Cowan,et al.  Adaptive equalization of finite nonlinear channels using multilayer perceptron , 1990 .

[40]  W. K. Jenkins,et al.  The use of orthogonal transforms for improving performance of adaptive filters , 1989 .

[41]  Donald F. Specht,et al.  A general regression neural network , 1991, IEEE Trans. Neural Networks.

[42]  David E. Rumelhart,et al.  Predicting the Future: a Connectionist Approach , 1990, Int. J. Neural Syst..

[43]  M. Nakamura,et al.  Improvements to the noise reduction neural network , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[44]  H. White,et al.  An additional hidden unit test for neglected nonlinearity in multilayer feedforward networks , 1989, International 1989 Joint Conference on Neural Networks.

[45]  Hecht-Nielsen Theory of the backpropagation neural network , 1989 .

[46]  Sheng Chen,et al.  Representations of non-linear systems: the NARMAX model , 1989 .

[47]  Alan V. Oppenheim,et al.  Discrete-Time Signal Pro-cessing , 1989 .

[48]  Y. Le Cun,et al.  Improving generalization performance in character recognition , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.

[49]  Zoran Obradovic,et al.  Small Depth Polynomial Size Neural Networks , 1990, Neural Computation.

[50]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[51]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1992, Math. Control. Signals Syst..

[52]  N. H. Wulff,et al.  Prediction with recurrent networks , 1992, Neural Networks for Signal Processing II Proceedings of the 1992 IEEE Workshop.

[53]  S. A. Billings,et al.  Structure Detection and Model Validity Tests in the Identification of Nonlinear Systems , 1983 .

[54]  M. Korenberg,et al.  Orthogonal approaches to time-series analysis and system identification , 1991, IEEE Signal Processing Magazine.

[55]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[56]  J. Friedman Multivariate adaptive regression splines , 1990 .

[57]  G. P. King,et al.  Extracting qualitative dynamics from experimental data , 1986 .

[58]  Lennart Ljung,et al.  Theory and Practice of Recursive Identification , 1983 .

[59]  Anders Krogh,et al.  A Simple Weight Decay Can Improve Generalization , 1991, NIPS.

[60]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[61]  R. Kashyap Inconsistency of the AIC rule for estimating the order of autoregressive models , 1980 .

[62]  Edward J. Delp,et al.  A tree-structured piecewise linear adaptive filter , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[63]  Lars Kai Hansen,et al.  Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[64]  John Moody,et al.  Note on generalization, regularization and architecture selection in nonlinear learning systems , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.

[65]  Nasir Ahmed,et al.  Optimum Laguerre networks for a class of discrete-time systems , 1991, IEEE Trans. Signal Process..

[66]  V. J. Mathews Adaptive polynomial filters , 1991, IEEE Signal Processing Magazine.

[67]  Jerry M. Mendel,et al.  Lessons in digital estimation theory , 1986 .

[68]  Kai-Bor Yu,et al.  Recursive updating the eigenvalue decomposition of a covariance matrix , 1991, IEEE Trans. Signal Process..

[69]  Bernard Widrow,et al.  Neural nets for adaptive filtering and adaptive pattern recognition , 1988, Computer.

[70]  K. P. Unnikrishnan,et al.  Nonlinear prediction of speech signals using memory neuron networks , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.

[71]  M. Schetzen The Volterra and Wiener Theories of Nonlinear Systems , 1980 .

[72]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[73]  Leon O. Chua,et al.  Canonical piecewise-linear analysis , 1983 .

[74]  Sheng Chen,et al.  Parallel recursive prediction error algorithm for training layered neural networks , 1990 .

[75]  John Moody,et al.  Prediction Risk and Architecture Selection for Neural Networks , 1994 .

[76]  Halbert White,et al.  Learning in Artificial Neural Networks: A Statistical Perspective , 1989, Neural Computation.

[77]  Terence D. Sanger,et al.  A Tree-Structured Algorithm for Reducing Computation in Networks with Separable Basis Functions , 1991, Neural Computation.

[78]  Geoffrey E. Hinton,et al.  Simplifying Neural Networks by Soft Weight-Sharing , 1992, Neural Computation.

[79]  Bengt Carlsson,et al.  Optimal differentiation based on stochastic signal models , 1991, IEEE Trans. Signal Process..

[80]  J.-N. Lin,et al.  Adaptive nonlinear digital filter with canonical piecewise-linear structure , 1990 .

[81]  G. Bierman Measurement updating using the U-D factorization , 1975 .

[82]  Godfried T. Toussaint,et al.  Bibliography on estimation of misclassification , 1974, IEEE Trans. Inf. Theory.

[83]  N. Draper,et al.  Applied Regression Analysis , 1966 .

[84]  J. Friedman,et al.  Projection Pursuit Regression , 1981 .

[85]  Stephen P. Banks,et al.  Rational Expansion for Nonlinear Input-Output Maps , 1988 .

[86]  Taiho Koh,et al.  Second-order Volterra filtering and its application to nonlinear system identification , 1985, IEEE Trans. Acoust. Speech Signal Process..

[87]  E. Lehmann Testing Statistical Hypotheses , 1960 .

[88]  Stephen A. Billings,et al.  Non-linear system identification using neural networks , 1990 .

[89]  J. Doyne Farmer,et al.  Exploiting Chaos to Predict the Future and Reduce Noise , 1989 .

[90]  David J. C. MacKay,et al.  Bayesian Model Comparison and Backprop Nets , 1991, NIPS.

[91]  R.J.F. Dow,et al.  Neural net pruning-why and how , 1988, IEEE 1988 International Conference on Neural Networks.

[92]  D.R. Hush,et al.  Progress in supervised neural networks , 1993, IEEE Signal Processing Magazine.

[93]  L. Chua,et al.  A generalized canonical piecewise-linear representation , 1990 .

[94]  Kumpati S. Narendra,et al.  Identification and control of dynamical systems using neural networks , 1990, IEEE Trans. Neural Networks.

[95]  J. Bendat,et al.  Random Data: Analysis and Measurement Procedures , 1971 .

[96]  Visakan Kadirkamanathan,et al.  A nonlinear model for time series prediction and signal interpolation , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[97]  Bernard Widrow,et al.  Adaptive switching circuits , 1988 .

[98]  Georgios B. Giannakis,et al.  Linear and non-linear adaptive noise cancelers , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[99]  Kurt Hornik,et al.  Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks , 1990, Neural Networks.

[100]  Ah Chung Tsoi,et al.  FIR and IIR Synapses, a New Neural Network Architecture for Time Series Modeling , 1991, Neural Computation.

[101]  David B. Fogel An information criterion for optimal neural network selection , 1991, IEEE Trans. Neural Networks.

[102]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[103]  Lars Kai Hansen,et al.  Stochastic linear learning: Exact test and training error averages , 1993, Neural Networks.

[104]  A. Ronald Gallant,et al.  Testing a Nonlinear Regression Specification: A Nonregular Case , 1977 .

[105]  S. Billings,et al.  A prediction-error and stepwise-regression estimation algorithm for non-linear systems , 1986 .

[106]  J.C. Principe,et al.  Adaline with adaptive recursive memory , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.

[107]  I. J. Leontaritis,et al.  Model selection and validation methods for non-linear systems , 1987 .

[108]  Bernard Widrow,et al.  30 years of adaptive neural networks: perceptron, Madaline, and backpropagation , 1990, Proc. IEEE.

[109]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[110]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[111]  Jooyoung Park,et al.  Universal Approximation Using Radial-Basis-Function Networks , 1991, Neural Computation.

[112]  H. Akaike A new look at the statistical model identification , 1974 .

[113]  B. Widrow,et al.  Stationary and nonstationary learning characteristics of the LMS adaptive filter , 1976, Proceedings of the IEEE.

[114]  T. W. Anderson An Introduction to Multivariate Statistical Analysis , 1959 .

[115]  George W. Hart,et al.  Memoryless nonlinear system identification with unknown model order , 1991, IEEE Trans. Inf. Theory.

[116]  James D. Keeler,et al.  Layered Neural Networks with Gaussian Hidden Units as Universal Approximations , 1990, Neural Computation.

[117]  A. Lapedes,et al.  Nonlinear Signal Processing Using Neural Networks , 1987 .

[118]  Bernard Widrow,et al.  Layered neural nets for pattern recognition , 1988, IEEE Trans. Acoust. Speech Signal Process..

[119]  James D. Keeler,et al.  Predicting the Future: Advantages of Semilocal Units , 1991, Neural Computation.