Algorithm Selection as a Bandit Problem with Unbounded Losses

Algorithm selection is typically based on models of algorithm performance learned during a separate offline training sequence, which can be prohibitively expensive. In recent work, we adopted an online approach, in which a performance model is iteratively updated and used to guide selection on a sequence of problem instances. The resulting exploration-exploitation trade-off was represented as a bandit problem with expert advice, using an existing solver for this game, but this required the setting of an arbitrary bound on algorithm runtimes, thus invalidating the optimal regret of the solver. In this paper, we propose a simpler framework for representing algorithm selection as a bandit problem, with partial information, and an unknown bound on losses. We adapt an existing solver to this game, proving a bound on its expected regret, which holds also for the resulting algorithm selection technique. We present experiments with a set of SAT solvers on a mixed SAT-UNSAT benchmark.

[1]  John R. Rice,et al.  The Algorithm Selection Problem , 1976, Adv. Comput..

[2]  Tad Hogg,et al.  An Economics Approach to Hard Computational Problems , 1997, Science.

[3]  Ricardo Vilalta,et al.  A Perspective View and Survey of Meta-Learning , 2002, Artificial Intelligence Review.

[4]  Yoav Shoham,et al.  Learning the Empirical Hardness of Optimization Problems: The Case of Combinatorial Auctions , 2002, CP.

[5]  J. Christopher Beck,et al.  Simple Rules for Low-Knowledge Algorithm Selection , 2004, CPAIOR.

[6]  Eric Horvitz,et al.  Dynamic restart policies , 2002, AAAI/IAAI.

[7]  Marek Petrik,et al.  Statistically Optimal Combination of Algorithms , 2004 .

[8]  Chu Min Li,et al.  Diversification and Determinism in Local Search for Satisfiability , 2005, SAT.

[9]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[10]  Yishay Mansour,et al.  Combining Multiple Heuristics , 2006, STACS.

[11]  Marek Petrik,et al.  Learning Static Parallel Portfolios of Algorithms , 2006, ISAIM.

[12]  Nicolò Cesa-Bianchi,et al.  Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[13]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14]  Yoav Shoham,et al.  Understanding Random SAT: Beyond the Clauses-to-Variables Ratio , 2004, CP.

[15]  Ehud Rivlin,et al.  Optimal Schedules for Parallelizing Anytime Algorithms: The Case of Shared Resources , 2003, J. Artif. Intell. Res..

[16]  Bart Selman,et al.  Algorithm portfolios , 2001, Artif. Intell..

[17]  Stephen F. Smith,et al.  An Asymptotically Optimal Algorithm for the Max k-Armed Bandit Problem , 2006, AAAI.

[18]  Bart Selman,et al.  Heavy-Tailed Phenomena in Satisfiability and Constraint Satisfaction Problems , 2000, Journal of Automated Reasoning.

[19]  Frank Hutter,et al.  Parameter Adjustment Based on Performance Prediction: Towards an Instance-Aware Problem Solver , 2005 .

[20]  J. Christopher Beck,et al.  APPLYING MACHINE LEARNING TO LOW‐KNOWLEDGE CONTROL OF OPTIMIZATION ALGORITHMS , 2005, Comput. Intell..

[21]  Yishay Mansour,et al.  Improved second-order bounds for prediction with expert advice , 2006, Machine Learning.

[22]  Stephen F. Smith,et al.  New Techniques for Algorithm Portfolio Design , 2008, UAI.

[23]  Peter Auer,et al.  Hannan Consistency in On-Line Learning in Case of Unbounded Losses Under Partial Monitoring , 2006, ALT.

[24]  Jürgen Schmidhuber,et al.  Adaptive Online Time Allocation to Search Algorithms , 2004, ECML.

[25]  Jürgen Schmidhuber,et al.  Learning dynamic algorithm portfolios , 2006, Annals of Mathematics and Artificial Intelligence.

[26]  Jürgen Schmidhuber,et al.  Learning Restart Strategies , 2007, IJCAI.

[27]  Booncharoen Sirinaovakul,et al.  Introduction to the Special Issue , 2002, Comput. Intell..

[28]  Jürgen Schmidhuber,et al.  A Neural Network Model for Inter-problem Adaptive Online Time Allocation , 2005, ICANN.

[29]  Stephen F. Smith,et al.  The Max K-Armed Bandit: A New Model of Exploration Applied to Search Heuristic Selection , 2005, AAAI.

[30]  Thomas Stützle,et al.  SATLIB: An Online Resource for Research on SAT , 2000 .

[31]  Kate Smith-Miles,et al.  Cross-disciplinary perspectives on meta-learning for algorithm selection , 2009, CSUR.

[32]  Ricardo Vilalta,et al.  Introduction to the Special Issue on Meta-Learning , 2004, Machine Learning.

[33]  Ehud Rivlin,et al.  Optimal schedules for parallelizing anytime algorithms: the case of independent processes , 2002, AAAI/IAAI.

[34]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[35]  Michail G. Lagoudakis,et al.  Reinforcement Learning for Algorithm Selection , 2000, AAAI/IAAI.

[36]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[37]  Jürgen Schmidhuber,et al.  Towards Distributed Algorithm Portfolios , 2008, DCAI.

[38]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[39]  Jürgen Schmidhuber,et al.  Dynamic Algorithm Portfolios , 2006, AI&M.

[40]  Thomas Stützle,et al.  Local Search Algorithms for SAT: An Empirical Evaluation , 2000, Journal of Automated Reasoning.

[41]  Stephen F. Smith,et al.  Combining Multiple Heuristics Online , 2007, AAAI.

[42]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[43]  Marek Petrik,et al.  Learning parallel portfolios of algorithms , 2006, Annals of Mathematics and Artificial Intelligence.

[44]  Michail G. Lagoudakis,et al.  Algorithm Selection using Reinforcement Learning , 2000, ICML.