Dynamic Algorithm Configuration: Foundation of a New Meta-Algorithmic Framework

The performance of many algorithms in the fields of hard combinatorial problem solving, machine learning or AI in general depends on parameter tuning. Automated methods have been proposed to alleviate users from the tedious and error-prone task of manually searching for performance-optimized configurations across a set of problem instances. However, there is still a lot of untapped potential through adjusting an algorithm’s parameters online since different parameter values can be optimal at different stages of the algorithm. Prior work showed that reinforcement learning is an effective approach to learn policies for online adjustments of algorithm parameters in a data-driven way. We extend that approach by formulating the resulting dynamic algorithm configuration as a contextual MDP, such that RL not only learns a policy for a single instance, but across a set of instances. To lay the foundation for studying dynamic algorithm configuration with RL in a controlled setting, we propose white-box benchmarks covering major aspects that make dynamic algorithm configuration a hard problem in practice and study the performance of various types of configuration strategies for them. On these white-box benchmarks, we show that (i) RL is a robust candidate for learning configuration policies, outperforming standard parameter optimization approaches, such as classical algorithm configuration; (ii) based on function approximation, RL agents can learn to generalize to new types of instances; and (iii) self-paced learning can substantially improve the performance by selecting a useful sequence of training instances automatically.

[1]  Leslie Pérez Cáceres,et al.  The irace package: Iterated racing for automatic algorithm configuration , 2016 .

[2]  Marcin Andrychowicz,et al.  Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[3]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[4]  Taehoon Kim,et al.  Quantifying Generalization in Reinforcement Learning , 2018, ICML.

[5]  Marius Thomas Lindauer,et al.  The Configurable SAT Solver Challenge (CSSC) , 2015, Artif. Intell..

[6]  Yoshitaka Sakurai,et al.  A Method to Control Parameters of Evolutionary Algorithms by Using Reinforcement Learning , 2010, 2010 Sixth International Conference on Signal-Image Technology and Internet Based Systems.

[7]  Yoav Shoham,et al.  Empirical hardness models: Methodology and a case study on combinatorial auctions , 2009, JACM.

[8]  David Zuckerman,et al.  Optimal Speedup of Las Vegas Algorithms , 1993, Inf. Process. Lett..

[9]  Kevin Leyton-Brown,et al.  Algorithm Runtime Prediction: Methods and Evaluation (Extended Abstract) , 2015, IJCAI.

[10]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[11]  Daphne Koller,et al.  Self-Paced Learning for Latent Variable Models , 2010, NIPS.

[12]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[13]  Kevin Leyton-Brown,et al.  Automated Configuration of Mixed Integer Programming Solvers , 2010, CPAIOR.

[14]  Carlos Ansótegui,et al.  Reactive Dialectic Search Portfolios for MaxSAT , 2017, AAAI.

[15]  Mark Hoogendoorn,et al.  Parameter Control in Evolutionary Algorithms: Trends and Challenges , 2015, IEEE Transactions on Evolutionary Computation.

[16]  R. Geoff Dromey,et al.  An algorithm for the selection problem , 1986, Softw. Pract. Exp..

[17]  Ann Nowé,et al.  Towards a White Box Approach to Automated Algorithm Design , 2016, IJCAI.

[18]  Benjamin Doerr,et al.  Theory of Parameter Control for Discrete Black-Box Optimization: Provable Performance Gains Through Dynamic Parameter Choices , 2018, Theory of Evolutionary Computation.

[19]  Carlos Ansótegui,et al.  A Gender-Based Genetic Algorithm for the Automatic Configuration of Algorithms , 2009, CP.

[20]  M. Helmert,et al.  FD-Autotune: Domain-Specific Configuration using Fast Downward , 2011 .

[21]  Roberto Battiti,et al.  An Investigation of Reinforcement Learning for Reactive Search Optimization , 2012, Autonomous Search.

[22]  Misha Denil,et al.  Learning to Learn without Gradient Descent by Gradient Descent , 2016, ICML.

[23]  Kevin Leyton-Brown,et al.  Improved Features for Runtime Prediction of Domain-Independent Planners , 2014, ICAPS.

[24]  Takuya Akiba,et al.  Chainer: A Deep Learning Framework for Accelerating the Research Cycle , 2019, KDD.

[25]  Marius Thomas Lindauer,et al.  Quantifying Homogeneity of Instance Sets for Algorithm Configuration , 2012, LION.

[26]  Michail G. Lagoudakis,et al.  Learning to Select Branching Rules in the DPLL Procedure for Satisfiability , 2001, Electron. Notes Discret. Math..

[27]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[28]  Manuel López-Ibáñez,et al.  Deep reinforcement learning based parameter control in differential evolution , 2019, GECCO.

[29]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[30]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[31]  Kevin Leyton-Brown,et al.  SATzilla: Portfolio-based Algorithm Selection for SAT , 2008, J. Artif. Intell. Res..

[32]  Yuri Malitsky,et al.  ISAC - Instance-Specific Algorithm Configuration , 2010, ECAI.

[33]  Eric Moulines,et al.  Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.

[34]  Kevin Leyton-Brown,et al.  Hydra: Automatically Configuring Algorithms for Portfolio-Based Selection , 2010, AAAI.

[35]  Mauro Brunato,et al.  Reactive Search and Intelligent Optimization , 2008 .

[36]  Sebastian Nowozin,et al.  Learning Step Size Controllers for Robust Neural Network Training , 2016, AAAI.

[37]  G. Evans,et al.  Learning to Optimize , 2008 .

[38]  Marius Thomas Lindauer,et al.  aspeed: Solver scheduling via answer set programming 1 , 2014, Theory and Practice of Logic Programming.