Combining Predictors: Comparison of Five Meta Machine Learning Methods

Abstract For some years there has been a trend towards combining predictors, and away from monolithic predictors. Two groups of meta machine learning (MML) methods are the ensemble methods and the Mixtures of Experts (ME) methods. In this article, five different representatives are presented, discussed, and compared. Three from the ensemble group, and two from the ME group. The selected methods are Simple ensemble, AdaBoost, Bagging, Hierarchical MEs, and a variation on the ME method found in R.A. Jacobs, M.I. Jordan, S.J. Nowlan, G.E. Hinton (Neural Computation 3 (1) (1991) 79–87), called Dynamic Coefficients (DynCo). DynCo can use any type of predictors that can be trained with gradient descent. It has a powerful combination method, and it encourages cooperation among the experts. DynCo compares favorably with the other four methods.

[1]  Geoffrey E. Hinton,et al.  An Alternative Model for Mixtures of Experts , 1994, NIPS.

[2]  Michael I. Jordan,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1994, Neural Computation.

[3]  Michael I. Jordan,et al.  Task Decomposition Through Competition in a Modular Connectionist Architecture: The What and Where Vision Tasks , 1990, Cogn. Sci..

[4]  Lutz Prechelt,et al.  PROBEN 1 - a set of benchmarks and benchmarking rules for neural network training algorithms , 1994 .

[5]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[6]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[7]  Harris Drucker,et al.  Improving Regressors using Boosting Techniques , 1997, ICML.

[8]  Michael I. Jordan,et al.  Task Decompostiion Through Competition in a Modular Connectionist Architecture: The What and Where Vision Tasks , 1993, Machine Learning: From Theory to Applications.

[9]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[10]  Anders Krogh,et al.  Learning with ensembles: How overfitting can be useful , 1995, NIPS.

[11]  Joydeep Ghosh,et al.  Structural adaptation in mixture of experts , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[12]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[13]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[14]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.