Energy functions for minimizing misclassification error with minimum-complexity networks

Abstract For automatic target recognition, a neural network is desired that minimizes the number of misclassifications with the minimum network complexity. Minimizing network complexity is important for both improving generalization and simplifying implementation. The least mean squares (LMS) energy function used in standard back propagation does not always produce such a network. Therefore, two minimum misclassification error (MME) energy functions are advanced to achieve this. Examples are given in which LMS requires five times as many hidden units in a multilayer perceptron to achieve test set classification accuracy similar to that achieved with the MME functions. Examples are given to provide insight into the nature of the LMS performance, namely that LMS approximates the a posteriori probabilities and class boundaries emerge indirectly from this process. The examples also show that the MME functions tend to find local minima less often than LMS does for the same number of hidden units. This is believed to be due to the difference in network complexity needed to accurately approximate a posteriori probabilities versus class boundaries.

[1]  R. Fletcher Practical Methods of Optimization , 1988 .

[2]  H. Szu,et al.  Implementing the minimum-misclassification-error energy function for target recognition , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[3]  Etienne Barnard,et al.  Optimization for training neural nets , 1992, IEEE Trans. Neural Networks.

[4]  Jack Sklansky,et al.  Training a One-Dimensional Classifier to Minimize the Probability of Error , 1972, IEEE Trans. Syst. Man Cybern..

[5]  B.V.K. Vijaya Kumar,et al.  Why error measures are sub-optimal for training neural network pattern classifiers , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[6]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[7]  Jack Sklansky,et al.  Pattern Classifiers and Trainable Machines , 1981 .

[8]  Richard Lippmann,et al.  Neural Network Classifiers Estimate Bayesian a posteriori Probabilities , 1991, Neural Computation.

[9]  Etienne Barnard Performance and generalization of the classification figure of merit criterion function , 1991, IEEE Trans. Neural Networks.

[10]  W. H. Highleyman,et al.  The design and analysis of pattern recognition experiments , 1962 .

[11]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[12]  Harold H. Szu,et al.  Classifying multispectral data by neural networks , 1993 .

[13]  B. V. K. Vijaya Kumar,et al.  Shooting Craps in Search of an Optimal Strategy for Training Connectionist Pattern Classifiers , 1991, NIPS.

[14]  Andrew R. Barron,et al.  Minimum complexity density estimation , 1991, IEEE Trans. Inf. Theory.

[15]  Bernard Widrow,et al.  30 years of adaptive neural networks: perceptron, Madaline, and backpropagation , 1990, Proc. IEEE.

[16]  Etienne Barnard,et al.  A comparison between criterion functions for linear classifiers, with an application to neural nets , 1989, IEEE Trans. Syst. Man Cybern..

[17]  Alexander H. Waibel,et al.  A novel objective function for improved phoneme recognition using time delay neural networks , 1990, International 1989 Joint Conference on Neural Networks.

[18]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[19]  Bhagavatula Vijaya Kumar,et al.  Differential theory of learning for efficient neural network pattern recognition , 1993, Defense, Security, and Sensing.