Managing Diversity in Regression Ensembles

Ensembles are a widely used and effective technique in machine learning---their success is commonly attributed to the degree of disagreement, or 'diversity', within the ensemble. For ensembles where the individual estimators output crisp class labels, this 'diversity' is not well understood and remains an open research issue. For ensembles of regression estimators, the diversity can be exactly formulated in terms of the covariance between individual estimator outputs, and the optimum level is expressed in terms of a bias-variance-covariance trade-off. Despite this, most approaches to learning ensembles use heuristics to encourage the right degree of diversity. In this work we show how to explicitly control diversity through the error function. The first contribution of this paper is to show that by taking the combination mechanism for the ensemble into account we can derive an error function for each individual that balances ensemble diversity with individual accuracy. We show the relationship between this error function and an existing algorithm called negative correlation learning, which uses a heuristic penalty term added to the mean squared error function. It is demonstrated that these methods control the bias-variance-covariance trade-off systematically, and can be utilised with any estimator capable of minimising a quadratic error function, for example MLPs, or RBF networks. As a second contribution, we derive a strict upper bound on the coefficient of the penalty term, which holds for any estimator that can be cast in a generalised linear regression framework, with mild assumptions on the basis functions. Finally we present the results of an empirical study, showing significant improvements over simple ensemble learning, and finding that this technique is competitive with a variety of methods, including boosting, bagging, mixtures of experts, and Gaussian processes, on a number of tasks.

[1]  I. Nabney,et al.  Non-linear Prediction of Quantitative Structure – Activity Relationships , 2004 .

[2]  Bruce E. Rosen,et al.  Ensemble Learning Using Decorrelated Neural Networks , 1996, Connect. Sci..

[3]  Gavin Brown,et al.  Between Two Extremes: Examining Decompositions of the Ensemble Objective Function , 2005, Multiple Classifier Systems.

[4]  T. Poggio,et al.  Bagging Regularizes , 2002 .

[5]  Anders Krogh,et al.  Learning with ensembles: How overfitting can be useful , 1995, NIPS.

[6]  Gavin Brown,et al.  Diversity in neural network ensembles , 2004 .

[7]  Kagan Tumer,et al.  Theoretical Foundations Of Linear And Order Statistics Combiners For Neural Pattern Classifiers , 1995 .

[8]  Naonori Ueda,et al.  Generalization error of ensemble estimators , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[9]  Xin Yao,et al.  Diversity creation methods: a survey and categorisation , 2004, Inf. Fusion.

[10]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[11]  Yong Liu Negative correlation learning and evolutionary design of neural network ensembles , 1999 .

[12]  Xin Yao,et al.  Evolutionary ensembles with negative correlation learning , 2000, IEEE Trans. Evol. Comput..

[13]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[14]  Peter Tiño,et al.  Nonlinear Prediction of Quantitative Structure-Activity Relationships , 2004, J. Chem. Inf. Model..

[15]  M. Perrone Improving regression estimation: Averaging methods for variance reduction with extensions to general convex measure optimization , 1993 .

[16]  Manfred M. Fischer,et al.  Neural network ensembles and their application to traffic flow prediction in telecommunications networks , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[17]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[18]  Fabio Roli,et al.  Linear Combiners for Classifier Fusion: Some Theoretical and Experimental Results , 2003, Multiple Classifier Systems.

[19]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[20]  Peter L. Bartlett,et al.  Functional Gradient Techniques for Combining Hypotheses , 2000 .