Neural Network Classifiers Estimate Bayesian a posteriori Probabilities

Many neural network classifiers provide outputs which estimate Bayesian a posteriori probabilities. When the estimation is accurate, network outputs can be treated as probabilities and sum to one. Simple proofs show that Bayesian probabilities are estimated when desired network outputs are 1 of M (one output unity, all others zero) and a squared-error or cross-entropy cost function is used. Results of Monte Carlo simulations performed using multilayer perceptron (MLP) networks trained with backpropagation, radial basis function (RBF) networks, and high-order polynomial networks graphically demonstrate that network outputs provide good estimates of Bayesian probabilities. Estimation accuracy depends on network complexity, the amount of training data, and the degree to which training data reflect true likelihood distributions and a priori class probabilities. Interpretation of network outputs as Bayesian probabilities allows outputs from multiple networks to be combined for higher level decision making, simplifies creation of rejection thresholds, makes it possible to compensate for differences between pattern class probabilities in training and test data, allows outputs to be used to minimize alternative risk functions, and suggests alternative measures of network performance.

[1]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[2]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[3]  Stephen Jose Hanson,et al.  Minkowski-r Back-Propagation: Learning in Connectionist Models with Non-Euclidian Error Signals , 1987, NIPS.

[4]  J J Hopfield,et al.  Learning algorithms and probability distributions in feed-forward and feed-back networks. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Eric B. Baum,et al.  Supervised Learning of Probability Distributions by Neural Networks , 1987, NIPS.

[6]  Esther Levin,et al.  Accelerated Learning in Layered Neural Networks , 1988, Complex Syst..

[7]  R. Lippmann Pattern classification using neural networks , 1989, IEEE Communications Magazine.

[8]  Hervé Bourlard,et al.  A Continuous Speech Recognition System Embedding MLP into HMM , 1989, NIPS.

[9]  Halbert White,et al.  Learning in Artificial Neural Networks: A Statistical Perspective , 1989, Neural Computation.

[10]  Waibel A novel objective function for improved phoneme recognition using time delay neural networks , 1989 .

[11]  John S. Bridle,et al.  Training Stochastic Model Recognition Algorithms as Networks can Lead to Maximum Mutual Information Estimation of Parameters , 1989, NIPS.

[12]  Geoffrey E. Hinton Connectionist Learning Procedures , 1989, Artif. Intell..

[13]  Richard P. Lippmann,et al.  A Comparative Study of the Practical Characteristics of Neural Network and Conventional Pattern Classifiers , 1990, NIPS 1990.

[14]  Geoffrey E. Hinton 20 – CONNECTIONIST LEARNING PROCEDURES1 , 1990 .

[15]  H. Gish,et al.  A probabilistic approach to the understanding and training of neural network classifiers , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[16]  Eric A. Wan,et al.  Neural network classification: a Bayesian interpretation , 1990, IEEE Trans. Neural Networks.

[17]  M.J.J. Holt,et al.  Convergence of back-propagation in neural networks using a log-likelihood cost function , 1990 .

[18]  Amro El-Jaroudi,et al.  A new error criterion for posterior probability estimation with neural nets , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[19]  Bruce W. Suter,et al.  The multilayer perceptron as an approximation to a Bayes optimal discriminant function , 1990, IEEE Trans. Neural Networks.

[20]  H. Bourlard,et al.  Links Between Markov Models and Multilayer Perceptrons , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Patrick A. Shoemaker,et al.  A note on least-squares learning procedures and classification by neural network models , 1991, IEEE Trans. Neural Networks.

[22]  Barak A. Pearlmutter,et al.  Equivalence Proofs for Multi-Layer Perceptron Classifiers and the Bayesian Discriminant Function , 1991 .

[23]  Richard Lippmann,et al.  Improved Hidden Markov Models Speech Recognition Using Radial Basis Function Networks , 1991, NIPS.