Hyperparamter tuning is one of the the most time-consuming parts in machine learning: The performance of a large number of different hyperparameter settings has to be evaluated to find the best one. Although modern optimization algorithms exist that minimize the number of evaluations needed, the evaluation of a single setting is still expensive: Using a resampling technique, the machine learning method has to be fitted a fixed number of K times on different training data sets. As an estimator for the performance of the setting the respective mean value of the K fits is used. Many hyperparameter settings could be discarded after less than K resampling iterations, because they already are clearly inferior to high performing settings. However, in practice, the resampling is often performed until the very end, wasting a lot of computational effort. We propose to use a sequential testing procedure to minimize the number of resampling iterations to detect inferior parameter setting. To do so, we first analyze the distribution of resampling errors, we will find out, that a log-normal distribution is promising. Afterwards, we build a sequential testing procedure assuming this distribution. This sequential test procedure is utilized within a random search algorithm. We compare a standard random search with our enhanced sequential random search in some realistic data situation. It can be shown that the sequential random search is able to find comparably good hyperparameter settings, however, the computational time needed to find those settings is roughly halved.
[1]
Luís Torgo,et al.
OpenML: networked science in machine learning
,
2014,
SKDD.
[2]
Eric R. Ziegel,et al.
The Elements of Statistical Learning
,
2003,
Technometrics.
[3]
Bernd Bischl,et al.
Resampling Methods for Meta-Model Validation with Recommendations for Evolutionary Computation
,
2012,
Evolutionary Computation.
[4]
M. Stephens.
EDF Statistics for Goodness of Fit and Some Comparisons
,
1974
.
[5]
Hadley Wickham,et al.
ggplot2 - Elegant Graphics for Data Analysis (2nd Edition)
,
2017
.
[6]
M. Delignette-Muller,et al.
fitdistrplus: An R Package for Fitting Distributions
,
2015
.
[7]
Andreas Ziegler,et al.
ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R
,
2015,
1508.04409.
[8]
Yoshua Bengio,et al.
Random Search for Hyper-Parameter Optimization
,
2012,
J. Mach. Learn. Res..
[9]
Trevor Hastie,et al.
Regularization Paths for Generalized Linear Models via Coordinate Descent.
,
2010,
Journal of statistical software.
[10]
David H. Wolpert,et al.
No free lunch theorems for optimization
,
1997,
IEEE Trans. Evol. Comput..
[11]
Daniela M. Witten,et al.
An Introduction to Statistical Learning: with Applications in R
,
2013
.
[12]
Bernd Bischl,et al.
mlr: Machine Learning in R
,
2016,
J. Mach. Learn. Res..
[13]
R. Khan,et al.
Sequential Tests of Statistical Hypotheses.
,
1972
.