Learning with ensembles: How overfitting can be useful

We study the characteristics of learning with ensembles. Solving exactly the simple model of an ensemble of linear students, we find surprisingly rich behaviour. For learning in large ensembles, it is advantageous to use under-regularized students, which actually over-fit the training data. Globally optimal performance can be obtained by choosing the training set sizes of the students appropriately. For smaller ensembles, optimization of the ensemble weights can yield significant improvements in ensemble generalization performance, in particular if the individual students are subject to noise in the training process. Choosing students with a wide range of regularization parameters makes this improvement robust against changes in the unknown level of noise in the training data.

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  Lars Kai Hansen,et al.  Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[4]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[5]  Sherif Hashem,et al.  Optimal Linear Combinations of Neural Networks , 1997, Neural Networks.