When Networks Disagree: Ensemble Methods for Hybrid Neural Networks

Abstract : This paper presents a general theoretical framework for ensemble methods of constructing significantly improved regression estimates. Given a population of regression estimators, the authors construct a hybrid estimator that is as good or better in the mean square error sense than any estimator in the population. They argue that the ensemble method presented has several properties: (1) it efficiently uses all the networks of a population -- none of the networks need to be discarded; (2) it efficiently uses all of the available data for training without over-fitting; (3) it inherently performs regularization by smoothing in functional space, which helps to avoid over-fitting; (4) it utilizes local minima to construct improved estimates whereas other neural network algorithms are hindered by local minima; (5) it is ideally suited for parallel computation; (6) it leads to a very useful and natural measure of the number of distinct estimators in a population; and (7) the optimal parameters of the ensemble estimator are given in closed form. Experimental results show that the ensemble method dramatically improves neural network performance on difficult real-world optical character recognition tasks.

[1]  W. Strawderman The Generalized Jackknife Statistic , 1973 .

[2]  Rupert G. Miller The jackknife-a review , 1974 .

[3]  Farhad Mehran,et al.  The Generalized Jackknife Statistic , 1975 .

[4]  M. Stone,et al.  Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[5]  B. Efron The jackknife, the bootstrap, and other resampling plans , 1987 .

[6]  Josef Skrzypek,et al.  Synergy of Clustering Multiple Back Propagation Networks , 1989, NIPS.

[7]  W. Härdle Applied Nonparametric Regression , 1991 .

[8]  G. Wahba Spline models for observational data , 1990 .

[9]  Stephen Cox,et al.  RecNorm: Simultaneous Normalisation and Classification Applied to Speech Recognition , 1990, NIPS.

[10]  Barak A. Pearlmutter,et al.  Chaitin-Kolmogorov Complexity and Generalization in Neural Networks , 1990, NIPS.

[11]  Christopher L. Scofield,et al.  Multiple neural net architectures for character recognition , 1991, COMPCON Spring '91 Digest of Papers.

[12]  Wray L. Buntine,et al.  Bayesian Back-Propagation , 1991, Complex Syst..

[13]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[14]  Radford M. Neal Bayesian Mixture Modeling by Monte Carlo Simulation , 1991 .

[15]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[16]  W. Härdle Applied Nonparametric Regression , 1992 .

[17]  Leon N. Cooper Hybrid neural network architectures: equilibrium systems that pay attention , 1992 .

[18]  Radford M. Neal Bayesian Learning via Stochastic Dynamics , 1992, NIPS.

[19]  Adam Krzyżak,et al.  Methods of combining multiple classifiers and their applications to handwriting recognition , 1992, IEEE Trans. Syst. Man Cybern..

[20]  William G. Baxt,et al.  Improving the Accuracy of an Artificial Neural Network Using Multiple Differently Trained Networks , 1992, Neural Computation.