Fast meta-models for local fusion of multiple predictive models

Fusing the outputs of an ensemble of diverse predictive models usually boosts overall prediction accuracy. Such fusion is guided by each model's local performance, i.e., each model's prediction accuracy in the neighborhood of the probe point. Therefore, for each probe we instantiate a customized fusion mechanism. The fusion mechanism is a meta-model, i.e., a model that operates one level above the object-level models whose predictions we want to fuse. Like these models, such a meta-model is defined by structural and parametric information. In this paper, we focus on the definition of the parametric information for a given structure. For each probe point, we either retrieve or compute the parameters to instantiate the associated meta-model. The retrieval approach is based on a CART-derived segmentation of the probe's state space, which contains the meta-model parameters. The computation approach is based on a run-time evaluation of each model's local performance in the neighborhood of the probe. We explore various structures for the meta-model, and for each structure we compare the pre-compiled (retrieval) or run-time (computation) approaches. We demonstrate this fusion methodology in the context of multiple neural network models. However, our methodology is broadly applicable to other predictive modeling approaches. This fusion method is illustrated in the development of highly accurate models for emissions, efficiency, and load prediction in a complex power plant. The locally weighted fusion method boosts the predictive performance by 30-50% over the baseline single model approach for the various prediction targets. Relative to this approach, typical fusion strategies that use averaging or globally weighting schemes only produce a 2-6% performance boost over the same baseline.

[1]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[2]  Piero P. Bonissone,et al.  Classifier Fusion Using Triangular Norms , 2004, Multiple Classifier Systems.

[3]  R. D. Veaux,et al.  Prediction intervals for neural networks via nonlinear regression , 1998 .

[4]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[5]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[6]  Kevin W. Bowyer,et al.  Combination of multiple classifiers using local accuracy estimates , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Adam Krzyżak,et al.  Methods of combining multiple classifiers and their applications to handwriting recognition , 1992, IEEE Trans. Syst. Man Cybern..

[8]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[9]  Bruce E. Rosen,et al.  Ensemble Learning Using Decorrelated Neural Networks , 1996, Connect. Sci..

[10]  Sherif Hashem,et al.  Optimal Linear Combinations of Neural Networks , 1997, Neural Networks.

[11]  David Lowe,et al.  Point-Wise Confidence Interval Estimation by Neural Networks: A Comparative Study based on Automotive Engine Calibration , 1999, Neural Computing & Applications.

[12]  Kagan Tumer,et al.  Error Correlation and Error Reduction in Ensemble Classifiers , 1996, Connect. Sci..

[13]  Fabio Roli,et al.  Methods for Designing Multiple Classifier Systems , 2001, Multiple Classifier Systems.

[14]  Alan F. Murray,et al.  Confidence estimation methods for neural networks : a practical comparison , 2001, ESANN.

[15]  Antanas Verikas,et al.  Soft combination of neural classifiers: A comparative study , 1999, Pattern Recognit. Lett..

[16]  Piero P. Bonissone,et al.  Using an Ensemble of Classifiers to Audit a Production Classifier , 2005, Multiple Classifier Systems.

[17]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[18]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[19]  Sargur N. Srihari,et al.  Decision Combination in Multiple Classifier Systems , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Robert Tibshirani,et al.  A Comparison of Some Error Estimates for Neural Network Models , 1996, Neural Computation.

[21]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[22]  D. Obradovic,et al.  Combining Artificial Neural Nets , 1999, Perspectives in Neural Computing.

[23]  Ching Y. Suen,et al.  A theoretical analysis of the application of majority voting to pattern recognition , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[24]  Tom Heskes,et al.  Practical Confidence and Prediction Intervals , 1996, NIPS.

[25]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[26]  Piero P. Bonissone,et al.  Locally Weighted Fusion of Multiple Predictive Models , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[27]  Ludmila I. Kuncheva,et al.  Switching between selection and fusion in combining classifiers: an experiment , 2002, IEEE Trans. Syst. Man Cybern. Part B.