Statistical risk analysis for classification and feature extraction by multilayer perceptrons

We investigate the training of multilayer perceptrons with the commonly used mean square error (MSE) criterion, and demonstrate a number of novel connections between the neural network operations and the Bayes risk analysis, Although previous research shows a number of connections from seemingly different criteria, we establish a common statistical framework to derive a generalized version of most, if not all, of these results, and also present several new results. We discuss the following: (1) We present two equivalent cost functions, and show that the MSE at the network output is equivalent to these cost functions for large samples. (2) We show that if the network performs a weighted classification, then the network output estimates the conditional risk. (3) We next show that if the final layer of the network is linear, then minimizing the MSE at the output, also maximizes a generalized criterion for nonlinear discriminant analysis (NDA). (4) We show that for a network with linear output layer, the outputs sum to one, and behave like probabilities. This new result allows us to estimate conditional risks at the network output, and also perform NDA at the final hidden layer. (5) Results for the uniform costs show that the MSE at the output is a tight upper bound of the error probability of the Bayes decision rule.

[1]  Richard Lippmann,et al.  Neural Network Classifiers Estimate Bayesian a posteriori Probabilities , 1991, Neural Computation.

[2]  Shigeki Miyake,et al.  A neural network approach to a Bayesian statistical decision problem , 1991, IEEE Trans. Neural Networks.

[3]  H. Gish,et al.  A probabilistic approach to the understanding and training of neural network classifiers , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[4]  Pierre A. Devijver,et al.  On a New Class of Bounds on Bayes Risk in Multihypothesis Pattern Recognition , 1974, IEEE Transactions on Computers.

[5]  T. Wagner,et al.  A mean-square performance criterion for adaptive pattern classification systems , 1967, IEEE Transactions on Automatic Control.

[6]  P. GALLINARI,et al.  On the relations between discriminant analysis and multilayer perceptrons , 1991, Neural Networks.

[7]  Bruce W. Suter,et al.  The multilayer perceptron as an approximation to a Bayes optimal discriminant function , 1990, IEEE Trans. Neural Networks.

[8]  Eric A. Wan,et al.  Neural network classification: a Bayesian interpretation , 1990, IEEE Trans. Neural Networks.

[9]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[10]  David Lowe,et al.  The optimised internal representation of multilayer classifier networks performs nonlinear discriminant analysis , 1990, Neural Networks.

[11]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .