Lazy Paired Hyper-Parameter Tuning

In virtually all machine learning applications, hyper-parameter tuning is required to maximize predictive accuracy. Such tuning is computationally expensive, and the cost is further exacerbated by the need for multiple evaluations (via cross-validation or bootstrap) at each configuration setting to guarantee statistically significant results. This paper presents a simple, general technique for improving the efficiency of hyper-parameter tuning by minimizing the number of resampled evaluations at each configuration. We exploit the fact that train-test samples can easily be matched across candidate hyper-parameter configurations. This permits the use of paired hypothesis tests and power analysis that allow for statistically sound early elimination of suboptimal candidates to minimize the number of evaluations. Results on synthetic and real-world datasets demonstrate that our method improves over competitors for discrete parameter settings, and enhances state-of-the-art techniques for continuous parameter settings.

[1]  W. Brogden Annual Review of Psychology , 1957 .

[2]  E. Lehmann Testing Statistical Hypotheses , 1960 .

[3]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[4]  Christian M. Ernst,et al.  Multi-armed Bandit Allocation Indices , 1989 .

[5]  J. Bather,et al.  Multi‐Armed Bandit Allocation Indices , 1990 .

[6]  R. Greiner Probabilistic Hill-climbing: Theory and Applications , 1992 .

[7]  Andrew W. Moore,et al.  Hoeffding Races: Accelerating Model Selection Search for Classification and Function Approximation , 1993, NIPS.

[8]  J. Shaffer Multiple Hypothesis Testing , 1995 .

[9]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[10]  Nikolaus Hansen,et al.  Adapting arbitrary normal mutation distributions in evolution strategies: the covariance matrix adaptation , 1996, Proceedings of IEEE International Conference on Evolutionary Computation.

[11]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[12]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[13]  Andrew W. Moore,et al.  Policy Search using Paired Comparisons , 2003, J. Mach. Learn. Res..

[14]  James C. Spall,et al.  Introduction to stochastic search and optimization - estimation, simulation, and control , 2003, Wiley-Interscience series in discrete mathematics and optimization.

[15]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[16]  Tim Hesterberg,et al.  Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control , 2004, Technometrics.

[17]  M. A. Luersen,et al.  A constrained, globalized, and bounded Nelder–Mead method for engineering optimization , 2004 .

[18]  James C. Spall,et al.  Introduction to Stochastic Search and Optimization. Estimation, Simulation, and Control (Spall, J.C. , 2007 .

[19]  K. Fernow New York , 1896, American Potato Journal.

[20]  Thorsten Joachims,et al.  The K-armed Dueling Bandits Problem , 2012, COLT.

[21]  Christian Igel,et al.  Hoeffding and Bernstein races for selecting policies in evolutionary direct policy search , 2009, ICML '09.

[22]  John L. Nazareth,et al.  Introduction to derivative-free optimization , 2010, Math. Comput..

[23]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[24]  Ali Rahimi,et al.  Structure Learning for Optimization , 2011, NIPS.

[25]  Yehuda Koren,et al.  Yahoo! music recommendations: modeling music ratings with temporal dynamics and item taxonomy , 2011, RecSys '11.

[26]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[27]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[28]  Mikio L. Braun,et al.  Fast cross-validation via sequential testing , 2012, J. Mach. Learn. Res..

[29]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .