Tight Lower Bounds for the Multiplicative Weights Algorithm on the Experts Problem

We develop tight lower bounds for the regret achievable by the widely used Multiplicative Weights Algorithm (MWA) on the fundamental problem of prediction with expert advice. It is well known that with k experts and T steps, MWA suffers a (minimax) regret of at most √ T ln k 2 [7]. In this work, we show that as T → ∞, MWA suffers a regret of at least √ T ln k 2 for every even k, thereby precisely characterizing MWA’s regret. We develop an almost identical characterization for every odd k > 1, where we show that as T →∞, MWA suffers a regret of at least √ T ln k 2 (1− 1 k2 ). The previously best known lower bound on MWA’s regret was 1 4 √ T log2 k [10], which is a factor 2.35 smaller than the optimal regret. In the equally natural geometric horizon model where the number of steps is a geometric random variable with mean 1δ , it is known that MWA suffers a regret of at most √ ln k 2δ . For every k ≥ 2, we show that as δ → 0, MWA suffers a regret of at least 1 2 √ ln k 2δ , there by shrinking the gap between known upper and lower bounds to factor 2. For the special case of k = 2 experts, we close the gap completely and precisely characterize MWA’s regret at 0.391 √ δ as δ → 0. We obtain the lower bounds by characterizing the structure of the optimal adversary. We show that the optimal adversary is obtained by a careful combination of two simple adversaries that exploit the weaknesses of MWA in opposite ways. Strikingly, despite the apparent similarity between the finite and geometric horizon models, we show that the structure of the optimal adversary for the finite and geometric horizon models are mirror-images of each other in a strong sense. ∗Microsoft Research. One Memorial Drive, Cambridge, MA 02142. ngravin@gmail.com. †Microsoft Research. One Microsoft Way, Redmond, WA 98052. peres@microsoft.com. ‡Google Research. 111 8th Ave, New York, NY 10011. balusivan@google.com.

[1]  Yuval Peres,et al.  Towards Optimal Algorithms for Prediction with Expert Advice , 2014, SODA.

[2]  Wang Feng,et al.  Online Learning Algorithms for Big Data Analytics: A Survey , 2015 .

[3]  Francesco Orabona,et al.  Unconstrained Online Linear Learning in Hilbert Spaces: Minimax Algorithms and Normal Approximations , 2014, COLT.

[4]  Haipeng Luo,et al.  Towards Minimax Online Learning with Unknown Time Horizon , 2013, ICML.

[5]  Wouter M. Koolen The Pareto Regret Frontier , 2013, NIPS.

[6]  H. Brendan McMahan,et al.  Minimax Optimal Algorithms for Unconstrained Linear Optimization , 2013, NIPS.

[7]  Ohad Shamir,et al.  Relax and Randomize : From Value to Algorithms , 2012, NIPS.

[8]  Ambuj Tewari,et al.  Online Learning: Beyond Regret , 2010, COLT.

[9]  Ambuj Tewari,et al.  Online Learning: Random Averages, Combinatorial Parameters, and Learnability , 2010, NIPS.

[10]  Peter L. Bartlett,et al.  A Stochastic View of Optimal Regret through Minimax Duality , 2009, COLT.

[11]  Robert E. Schapire,et al.  Learning with continuous experts using drifting games , 2008, Theor. Comput. Sci..

[12]  Ambuj Tewari,et al.  Optimal Stragies and Minimax Lower Bounds for Online Convex Games , 2008, COLT.

[13]  Manfred K. Warmuth,et al.  When Random Play is Optimal Against an Adversary , 2008, COLT.

[14]  Elad Hazan,et al.  Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[15]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[16]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[17]  Nicolò Cesa-Bianchi,et al.  Analysis of two gradient-based algorithms for on-line regression , 1997, COLT '97.

[18]  Adam Tauman Kalai,et al.  Universal Portfolios With and Without Transaction Costs , 1997, COLT '97.

[19]  Manfred K. Warmuth,et al.  How to use expert advice , 1997, JACM.

[20]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[21]  David Haussler,et al.  Tight worst-case loss bounds for predicting with expert advice , 1994, EuroCOLT.

[22]  Vladimir Vovk,et al.  Aggregating strategies , 1990, COLT '90.

[23]  Manfred K. Warmuth,et al.  The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.

[24]  James Hannan,et al.  4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .