Reinforcement Learning for Algorithm Selection

Many computational problems can be solved by multiple algorithms, with different algorithms fastest for different problem sizes, input distributions, and hardware characteristics. We consider the problem of algorithm selection: dynamically choose an algorithm to attack an instance or subinstances (due to recursive calls) of a problem with the goal of minimizing the overall execution time. We formulate the problem as a kind of Markov Decision Process (MDP), and use ideas from reinforcement learning (RL) to solve it. The process’ state consists of a set of instance features, such as problem size. Actions are the different algorithms we can choose from. Non-recursive algorithms are terminal in that they solve the problem completely (terminal state). Recursive algorithms create subproblems and therefore cause transitions to other states, making the task a sequential decision task. The immediate cost of a decision is the real time taken for executing the selected algorithm on the current instance, excluding time taken in recursive calls. Thus, the total (undiscounted) cost during an episode is the time taken to solve the problem. The goal is a policy that minimizes the total cost/time. This process differs from a standard MDP as it allows one-to-many state transitions (multiple recursive calls at one level). Our initial experiments focus on the problem of order statistic selection: given an array of (unordered) numbers and some index , select the number that would rank -th if the array were sorted. We picked two algorithms such that neither is best in all cases, otherwise learning would not help. DETERMINISTIC SELECT ( )i s an recursive algorithm and HEAP SELECT ( )i s an algorithm