Global optimization in machine learning: the design of a predictive analytics application

Global optimization, especially Bayesian optimization, has become the tool of choice in hyperparameter tuning and algorithmic configuration to optimize the generalization capability of machine learning algorithms. The contribution of this paper was to extend this approach to a complex algorithmic pipeline for predictive analytics, based on time-series clustering and artificial neural networks. The software environment R has been used with mlrMBO, a comprehensive and flexible toolbox for sequential model-based optimization. Random forest has been adopted as surrogate model, due to the nature of decision variables (i.e., conditional and discrete hyperparameters) of the case studies considered. Two acquisition functions have been considered: Expected improvement and lower confidence bound, and results are compared. The computational results, on a benchmark and a real-world dataset, show that even in a complex search space, up to 80 dimensions related to integer, categorical, and conditional variables (i.e., hyperparameters), sequential model-based optimization is an effective solution, with lower confidence bound requiring a lower number of function evaluations than expected improvement to find the same optimal solution.

[1]  Francesco Archetti,et al.  Identifying Typical Urban Water Demand Patterns for a Reliable Short-term Forecasting – The Icewater Project Approach , 2014 .

[2]  Antonio Candelieri Clustering and Support Vector Regression for Water Demand Forecasting and Anomaly Detection , 2017 .

[3]  Kirthevasan Kandasamy,et al.  High Dimensional Bayesian Optimisation and Bandits via Additive Models , 2015, ICML.

[4]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[5]  Nando de Freitas,et al.  Bayesian Optimization in High Dimensions via Random Embeddings , 2013, IJCAI.

[6]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[7]  Kevin Leyton-Brown,et al.  Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms , 2012, KDD.

[8]  Francesco Archetti,et al.  Short-term forecasting of hourly water consumption by using automatic metering readers data , 2015 .

[9]  Inderjit S. Dhillon,et al.  Kernel k-means: spectral clustering and normalized cuts , 2004, KDD.

[10]  David H. Wolpert,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996, Neural Computation.

[11]  N. Zheng,et al.  Global Optimization of Stochastic Black-Box Systems via Sequential Kriging Meta-Models , 2006, J. Glob. Optim..

[12]  Francesco Archetti,et al.  Automatic Configuration of Kernel-Based Clustering: An Optimization Approach , 2017, LION.

[13]  Bernd Bischl,et al.  mlrMBO: A Modular Framework for Model-Based Optimization of Expensive Black-Box Functions , 2017, 1703.03373.