Optimized Feature Extraction and the Bayes Decision in Feed-Forward Classifier Networks

The problem of multiclass pattern classification using adaptive layered networks is addressed. A special class of networks, i.e., feed-forward networks with a linear final layer, that perform generalized linear discriminant analysis is discussed, This class is sufficiently generic to encompass the behavior of arbitrary feed-forward nonlinear networks. Training the network consists of a least-square approach which combines a generalized inverse computation to solve for the final layer weights, together with a nonlinear optimization scheme to solve for parameters of the nonlinearities. A general analytic form for the feature extraction criterion is derived, and it is interpreted for specific forms of target coding and error weighting. An important aspect of the approach is to exhibit how a priori information regarding nonuniform class membership, uneven distribution between train and test sets, and misclassification costs may be exploited in a regularized manner in the training phase of networks. >

[1]  D. G. Bounds,et al.  A multilayer perceptron network for the diagnosis of low back pain , 1988, IEEE 1988 International Conference on Neural Networks.

[2]  Donald F. Specht,et al.  Generation of Polynomial Discriminant Functions for Pattern Recognition , 1967, IEEE Trans. Electron. Comput..

[3]  Hideki Asoh,et al.  An approximation of nonlinear discriminant analysis by multilayer neural networks , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[4]  S. Yau,et al.  LEAST-MEAN-SQUARE APPROACH TO PATTERN CLASSIFICATION , 1972 .

[5]  Patrick Gallinari,et al.  Multilayer perceptrons and data analysis , 1988, IEEE 1988 International Conference on Neural Networks.

[6]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[7]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[8]  Nils J. Nilsson,et al.  Learning Machines: Foundations of Trainable Pattern-Classifying Systems , 1965 .

[9]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[10]  David Lowe,et al.  A Comparison of Nonlinear Optimisation Strategies for Feed-Forward Adaptive Layered Networks , 1988 .

[11]  Jerry M. Mendel Gradient, error-correction identification algorithms , 1968, Inf. Sci..

[12]  Anil K. Jain,et al.  39 Dimensionality and sample size considerations in pattern recognition practice , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.

[13]  David J. Hand,et al.  Discrimination and Classification , 1982 .

[14]  Tzay Y. Young,et al.  Classification, Estimation and Pattern Recognition , 1974 .

[15]  G. Mirchandani,et al.  On hidden nodes for neural nets , 1989 .

[16]  W. S. Meisel,et al.  Least-square methods in abstract pattern recognition , 1968, Inf. Sci..

[17]  David Lowe,et al.  The optimised internal representation of multilayer classifier networks performs nonlinear discriminant analysis , 1990, Neural Networks.

[18]  Terrence J. Sejnowski,et al.  Analysis of hidden units in a layered network trained to classify sonar targets , 1988, Neural Networks.

[19]  David S. Broomhead,et al.  Multivariable Functional Interpolation and Adaptive Networks , 1988, Complex Syst..

[20]  William G. Wee,et al.  Generalized Inverse Approach to Adaptive Multiclass Pattern Classification , 1968, IEEE Transactions on Computers.

[21]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[22]  Kurt Hornik,et al.  Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.