An efficient high-probability algorithm for Linear Bandits

For the linear bandit problem, we extend the analysis of algorithm CombEXP from [R. Combes, M. S. Talebi Mazraeh Shahi, A. Proutiere, and M. Lelarge. Combinatorial bandits revisited. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28, pages 2116--2124. Curran Associates, Inc., 2015. URL this http URL] to the high-probability case against adaptive adversaries, allowing actions to come from an arbitrary polytope. We prove a high-probability regret of \(O(T^{2/3})\) for time horizon \(T\). While this bound is weaker than the optimal \(O(\sqrt{T})\) bound achieved by GeometricHedge in [P. L. Bartlett, V. Dani, T. Hayes, S. Kakade, A. Rakhlin, and A. Tewari. High-probability regret bounds for bandit online linear optimization. In 21th Annual Conference on Learning Theory (COLT 2008), July 2008. this http URL], CombEXP is computationally efficient, requiring only an efficient linear optimization oracle over the convex hull of the actions.

[1]  Renato Paes Leme,et al.  Tight Bounds for Approximate Carathéodory and Beyond , 2015, ICML.

[2]  Thomas P. Hayes,et al.  High-Probability Regret Bounds for Bandit Online Linear Optimization , 2008, COLT.

[3]  Elad Hazan,et al.  Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization , 2008, COLT.

[4]  Baruch Awerbuch,et al.  Adaptive routing with end-to-end feedback: distributed learning and geometric approaches , 2004, STOC '04.

[5]  Thomas Rothvoß,et al.  The matching polytope has exponential extension complexity , 2013, STOC.

[6]  Jack Edmonds,et al.  Maximum matching and a polyhedron with 0,1-vertices , 1965 .

[7]  Gergely Neu,et al.  Explore no more: Improved high-probability regret bounds for non-stochastic bandits , 2015, NIPS.

[8]  Yuanzhi Li,et al.  An optimal algorithm for bandit convex optimization , 2016, ArXiv.

[9]  Stephen J. Wright,et al.  Efficient Bregman Projections onto the Permutahedron and Related Polytopes , 2016, AISTATS.

[10]  Yin Tat Lee,et al.  Kernel-based methods for bandit convex optimization , 2016, STOC.

[11]  Xiequan Fan,et al.  Hoeffding’s inequality for supermartingales , 2011, 1109.4359.

[12]  Martin Jaggi,et al.  On the Global Linear Convergence of Frank-Wolfe Optimization Variants , 2015, NIPS.

[13]  Alexandre Proutière,et al.  Combinatorial Bandits Revisited , 2015, NIPS.

[14]  Ambuj Tewari,et al.  Online Bandit Learning against an Adaptive Adversary: from Regret to Policy Regret , 2012, ICML.

[15]  Thomas P. Hayes,et al.  The Price of Bandit Information for Online Optimization , 2007, NIPS.

[16]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[17]  Gábor Lugosi,et al.  Regret in Online Combinatorial Optimization , 2012, Math. Oper. Res..

[18]  Martin Jaggi,et al.  Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[19]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[20]  A. Nemirovski Advances in convex optimization : conic programming , 2005 .

[21]  Nicolò Cesa-Bianchi,et al.  Combinatorial Bandits , 2012, COLT.

[22]  Sébastien Bubeck,et al.  Multi-scale exploration of convex functions and bandit convex optimization , 2015, COLT.

[23]  Elad Hazan,et al.  Volumetric Spanners: An Efficient Exploration Basis for Learning , 2013, J. Mach. Learn. Res..

[24]  Thomas P. Hayes,et al.  How to Beat the Adaptive Multi-Armed Bandit , 2006, ArXiv.

[25]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .