Branching bandits and Klimov's problem: achievable region and side constraints

We consider the average cost branching bandits problem and its special case known as Klimov's problem. We consider the vector n whose components are the mean number of bandits (or customers) of each type that are present. We characterize fully the achievable region, that is, the set of all possible vectors n that can be obtained by considering all possible policies. While the original description of the achievable region involves exponentially many constraints, we also develop an alternative description that involves only O(R/sup 2/) variables and constraints, where R is the number of bandit types (or customer classes). We then consider the problem of minimizing a linear function of n subject to L additional linear constraints on n. We show that optimal policies can be obtained by randomizing between L+1 strict priority policies that can be found efficiently (in polynomial time) using linear programming techniques.