Analysis of decision boundaries in linearly combined neural classifiers

Combining or integrating the outputs of several pattern classifiers has led to improved performance in a multitude of applications. This paper provides an analytical framework to quantify the improvements in classification results due to combining. We show that combining networks linearly in output space reduces the variance of the actual decision region boundaries around the optimum boundary. This result is valid under the assumption that the a posteriori probability distributions for each class are locally monotonic around the Bayes optimum boundary. In the absence of classifier bias, the error is shown to be proportional to the boundary variance, resulting in a simple expression for error rate improvements. In the presence of bias, the error reduction, expressed in terms of a bias reduction factor, is shown to be less than or equal to the reduction obtained in the absence of bias. The analysis presented here facilitates the understanding of the relationships among error rates, classifier boundary distributions and combining in output space.

[1]  Galina L. Rogova,et al.  Combining the results of several neural network classifiers , 1994, Neural Networks.

[2]  Esther Levin,et al.  A statistical approach to learning and generalization in layered neural networks , 1989, COLT '89.

[3]  L. Cooper,et al.  When Networks Disagree: Ensemble Methods for Hybrid Neural Networks , 1992 .

[4]  Sherif Hashem,et al.  Optimal Linear Combinations of Neural Networks , 1997, Neural Networks.

[5]  Joydeep Ghosh,et al.  Evidence combination techniques for robust classification of short-duration oceanic signals , 1992, Defense, Security, and Sensing.

[6]  J. Mesirov,et al.  Hybrid system for protein secondary structure prediction. , 1992, Journal of molecular biology.

[7]  David Heckerman,et al.  Probabilistic Interpretation for MYCIN's Certainty Factors , 1990, UAI.

[8]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[9]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[10]  L. N. Kanal,et al.  Uncertainty in Artificial Intelligence 5 , 1990 .

[11]  Richard Lippmann,et al.  Neural Network Classifiers Estimate Bayesian a posteriori Probabilities , 1991, Neural Computation.

[12]  M. Perrone Improving regression estimation: Averaging methods for variance reduction with extensions to general convex measure optimization , 1993 .

[13]  Kagan Tumer,et al.  Structural adaptation and generalization in supervised feed-forward networks , 1994 .

[14]  Randy L. Shimabukuro,et al.  Least-Squares Learning and Approximation of Posterior Probabilities on Classification Problems by Neural Network Models , 1991 .

[15]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[16]  Joydeep Ghosh,et al.  Integration Of Neural Classifiers For Passive Sonar Signals , 1996 .

[17]  David H. Wolpert,et al.  A Mathematical Theory of Generalization: Part II , 1990, Complex Syst..

[18]  Adam Krzyżak,et al.  Methods of combining multiple classifiers and their applications to handwriting recognition , 1992, IEEE Trans. Syst. Man Cybern..

[19]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[20]  Kagan Tumer,et al.  Boundary variance reduction for improved classification through hybrid networks , 1995, SPIE Defense + Commercial Sensing.

[21]  Josef Skrzypek,et al.  Synergy of Clustering Multiple Back Propagation Networks , 1989, NIPS.

[22]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.