On the Discriminatory Power of Adaptive Feed-Forward Layered Networks

This correspondence expands the available theoretical framework that establishes a link between discriminant analysis and adaptive feed-forward layered linear-output networks used as mean-square classifiers. This has the advantages of providing more theoretical justification for the use of these nets in pattern classification and gaining a better insight into their behavior and about their use. The authors prove that, under reasonable assumptions, minimizing the mean-square error at the network output is equivalent to minimizing the following: 1) the difference between the optimum value of a familiar discriminant criterion and the value of this criterion evaluated in the space spanned 2) the outputs of the final hidden layer, and 3) the difference between the values of the same discriminant criterion evaluated in desired-output and actual-output subspaces. The authors also illustrate, under specific constraints, how to solve the following problem: given a feature extraction criterion, how the target coding scheme can be selected such that this criterion is maximized at the output of the network final hidden layer. Other properties for these networks are explored. >

[1]  William G. Wee,et al.  Generalized Inverse Approach to Adaptive Multiclass Pattern Classification , 1968, IEEE Transactions on Computers.

[2]  Hideki Asoh,et al.  An approximation of nonlinear discriminant analysis by multilayer neural networks , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[3]  David Lowe,et al.  The optimised internal representation of multilayer classifier networks performs nonlinear discriminant analysis , 1990, Neural Networks.

[4]  Terrence J. Sejnowski,et al.  Learned classification of sonar targets using a massively parallel network , 1988, IEEE Trans. Acoust. Speech Signal Process..

[5]  R.P. Lippmann,et al.  Pattern classification using neural networks , 1989, IEEE Communications Magazine.

[6]  S. R. Searle,et al.  Matrix Algebra Useful for Statistics , 1982 .

[7]  P. GALLINARI,et al.  On the relations between discriminant analysis and multilayer perceptrons , 1991, Neural Networks.

[8]  Patrick Gallinari,et al.  Multilayer perceptrons and data analysis , 1988, IEEE 1988 International Conference on Neural Networks.

[9]  Keinosuke Fukunaga,et al.  The optimum nonlinear features for a scatter criterion in discriminant analysis , 1977, IEEE Trans. Inf. Theory.

[10]  Richard Lippmann,et al.  Practical Characteristics of Neural Network and Conventional Pattern Classifiers on Artificial and Speech Problems , 1989, NIPS.

[11]  Keinosuke Fukunaga,et al.  A class of feature extraction criteria and its relation to the Bayes risk estimate , 1980, IEEE Trans. Inf. Theory.

[12]  Richard Lippmann,et al.  Neural Net and Traditional Classifiers , 1987, NIPS.

[13]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[14]  David G. Lowe,et al.  Optimized Feature Extraction and the Bayes Decision in Feed-Forward Classifier Networks , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Keinosuke Fukunaga,et al.  Nonlinear feature extraction with a general criterion function , 1978, IEEE Trans. Inf. Theory.