Sequential Model-Based Ensemble Optimization

One of the most tedious tasks in the application of machine learning is model selection, i.e. hyperparameter selection. Fortunately, recent progress has been made in the automation of this process, through the use of sequential model-based optimization (SMBO) methods. This can be used to optimize a cross-validation performance of a learning algorithm over the value of its hyperparameters. However, it is well known that ensembles of learned models almost consistently outperform a single model, even if properly selected. In this paper, we thus propose an extension of SMBO methods that automatically constructs such ensembles. This method builds on a recently proposed ensemble construction paradigm known as agnostic Bayesian learning. In experiments on 22 regression and 39 classification data sets, we confirm the success of this proposed approach, which is able to outperform model selection with SMBO.

[1]  François Laviolette,et al.  From PAC-Bayes Bounds to Quadratic Programs for Majority Votes , 2011, ICML.

[2]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[3]  François Laviolette,et al.  Agnostic Bayesian Learning of Ensembles , 2014, ICML.

[4]  Hyun-Chul Kim,et al.  Bayesian Classifier Combination , 2012, AISTATS.

[5]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[6]  Ethem Alpaydin,et al.  Incremental construction of classifier and discriminant ensembles , 2009, Inf. Sci..

[7]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[8]  Yehuda Koren,et al.  The BellKor solution to the Netflix Prize , 2007 .

[9]  William Nick Street,et al.  Ensemble Pruning Via Semi-definite Programming , 2006, J. Mach. Learn. Res..

[10]  François Laviolette,et al.  Bayesian Comparison of Machine Learning Algorithms on Single and Multiple Datasets , 2012, AISTATS.

[11]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[12]  Kevin Leyton-Brown,et al.  Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms , 2012, KDD.

[13]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[14]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[15]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[16]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[17]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .