25th Annual Conference on Learning Theory Analysis of Thompson Sampling for the Multi-armed Bandit Problem