论文信息 - Tight Bounds for Bandit Combinatorial Optimization

Tight Bounds for Bandit Combinatorial Optimization

We revisit the study of optimal regret rates in bandit combinatorial optimization---a fundamental framework for sequential decision making under uncertainty that abstracts numerous combinatorial prediction problems. We prove that the attainable regret in this setting grows as $\widetilde{\Theta}(k^{3/2}\sqrt{dT})$ where $d$ is the dimension of the problem and $k$ is a bound over the maximal instantaneous loss, disproving a conjecture of Audibert, Bubeck, and Lugosi (2013) who argued that the optimal rate should be of the form $\widetilde{\Theta}(k\sqrt{dT})$. Our bounds apply to several important instances of the framework, and in particular, imply a tight bound for the well-studied bandit shortest path problem. By that, we also resolve an open problem posed by Cesa-Bianchi and Lugosi (2012).

[1] Baruch Awerbuch,et al. Adaptive routing with end-to-end feedback: distributed learning and geometric approaches , 2004, STOC '04.

[2] Gergely Neu,et al. First-order regret bounds for combinatorial semi-bandits , 2015, COLT.

[3] Manfred K. Warmuth,et al. Learning Permutations with Exponential Weights , 2007, COLT.

[4] Elad Hazan,et al. Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization , 2008, COLT.

[5] Thomas P. Hayes,et al. The Price of Bandit Information for Online Optimization , 2007, NIPS.

[6] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[7] Gábor Lugosi,et al. Regret in Online Combinatorial Optimization , 2012, Math. Oper. Res..

[8] Avrim Blum,et al. Online Geometric Optimization in the Bandit Setting Against an Adaptive Adversary , 2004, COLT.

[9] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[10] Nicolò Cesa-Bianchi,et al. Combinatorial Bandits , 2012, COLT.

[11] Robert E. Schapire,et al. Non-Stochastic Bandit Slate Problems , 2010, NIPS.

[12] Manfred K. Warmuth,et al. Path Kernels and Multiplicative Updates , 2002, J. Mach. Learn. Res..

[13] Tamás Linder,et al. The On-Line Shortest Path Problem Under Partial Monitoring , 2007, J. Mach. Learn. Res..

[14] Sham M. Kakade,et al. Towards Minimax Policies for Online Linear Optimization with Bandit Feedback , 2012, COLT.

[15] Ohad Shamir,et al. On the Complexity of Bandit Linear Optimization , 2014, COLT.

[16] Elad Hazan,et al. Volumetric Spanners: An Efficient Exploration Basis for Learning , 2013, J. Mach. Learn. Res..

[17] Gergely Neu,et al. Importance Weighting Without Importance Weights: An Efficient Algorithm for Combinatorial Semi-Bandits , 2015, J. Mach. Learn. Res..

[18] Santosh S. Vempala,et al. Efficient algorithms for online decision problems , 2005, J. Comput. Syst. Sci..