Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits

A Proofs of Main Theorems A.1 Proof of Lemma 1 Let Rt = R(At, wt) be the stochastic regret of CombUCB1 at time t, where At and wt are the solution and the weights of the items at time t, respectively. Furthermore, let Et = 9e 2 E : w̄(e) ŵTt 1(e)(e) ct 1,Tt 1(e) be the event that w̄(e) is outside of the high-probability confidence interval around ŵTt 1(e)(e) for some item e at time t; and let Et be the complement of Et, w̄(e) is in the high-probability confidence interval around ŵTt 1(e)(e) for all e at time t. Then we can decompose the regret of CombUCB1 as: R(n) = E " t0 1 X