Clustering ensembles of neural network models

We show that large ensembles of (neural network) models, obtained e.g. in bootstrapping or sampling from (Bayesian) probability distributions, can be effectively summarized by a relatively small number of representative models. In some cases this summary may even yield better function estimates. We present a method to find representative models through clustering based on the models' outputs on a data set. We apply the method on an ensemble of neural network models obtained from bootstrapping on the Boston housing data, and use the results to discuss bootstrapping in terms of bias and variance. A parallel application is the prediction of newspaper sales, where we learn a series of parallel tasks. The results indicate that it is not necessary to store all samples in the ensembles: a small number of representative models generally matches, or even surpasses, the performance of the full ensemble. The clustered representation of the ensemble obtained thus is much better suitable for qualitative analysis, and will be shown to yield new insights into the data.

[1]  Rich Caruana,et al.  Multitask Learning , 1997, Machine-mediated learning.

[2]  Kenneth Rose,et al.  Hierarchical, Unsupervised Learning with Growing via Phase Transitions , 1996, Neural Computation.

[3]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[4]  Rose,et al.  Statistical mechanics and phase transitions in clustering. , 1990, Physical review letters.

[5]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[6]  Stefan M. Rüger,et al.  Clustering in Weight Space of Feedforward Nets , 1996, ICANN.

[7]  Padhraic Smyth,et al.  A general probabilistic framework for clustering individuals and objects , 2000, KDD '00.

[8]  Fabio Roli,et al.  Methods for Designing Multiple Classifier Systems , 2001, Multiple Classifier Systems.

[9]  Tom Heskes,et al.  Solving a Huge Number of Similar Tasks: A Combination of Multi-Task Learning and a Hierarchical Bayesian Approach , 1998, ICML.

[10]  Tom Heskes,et al.  Model clustering by deterministic annealing , 1999, ESANN.

[11]  Noel E. Sharkey,et al.  The "Test and Select" Approach to Ensemble Combination , 2000, Multiple Classifier Systems.

[12]  William B. Yates,et al.  Engineering Multiversion Neural-Net Systems , 1996, Neural Computation.

[13]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[14]  L. Breiman Arcing Classifiers , 1998 .

[15]  Donald B. Rubin,et al.  EM and beyond , 1991 .

[16]  Sherif Hashem,et al.  Optimal Linear Combinations of Neural Networks , 1997, Neural Networks.

[17]  L. Cooper,et al.  When Networks Disagree: Ensemble Methods for Hybrid Neural Networks , 1992 .

[18]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[19]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[20]  Nikos A. Vlassis,et al.  A Greedy EM Algorithm for Gaussian Mixture Learning , 2002, Neural Processing Letters.

[21]  Joachim M. Buhmann,et al.  Vector quantization with complexity costs , 1993, IEEE Trans. Inf. Theory.

[22]  D. Rubinfeld,et al.  Hedonic housing prices and the demand for clean air , 1978 .

[23]  Pedro M. Domingos Why Does Bagging Work? A Bayesian Account and its Implications , 1997, KDD.