On the optimality of the Hedge algorithm in the stochastic regime

In this paper, we study the behavior of the Hedge algorithm in the online stochastic setting. We prove that anytime Hedge with decreasing learning rate, which is one of the simplest algorithm for the problem of prediction with expert advice, is remarkably both worst-case optimal and adaptive to the easier stochastic and adversarial with a gap problems. This shows that, in spite of its small, non-adaptive learning rate, Hedge possesses the same optimal regret guarantee in the stochastic case as recently introduced adaptive algorithms. Moreover, our analysis exhibits qualitative differences with other versions of the Hedge algorithm, such as the fixed-horizon variant (with constant learning rate) and the one based on the so-called "doubling trick", both of which fail to adapt to the easier stochastic setting. Finally, we determine the intrinsic limitations of anytime Hedge in the stochastic case, and discuss the improvements provided by more adaptive algorithms.

[1]  Tor Lattimore,et al.  Following the Leader and Fast Rates in Online Linear Prediction: Curved Constraint Sets and Other Regularities , 2017, J. Mach. Learn. Res..

[2]  Alessandro Lazaric,et al.  Exploiting easy data in online optimization , 2014, NIPS.

[3]  Mark D. Reid,et al.  Fast rates in statistical and online learning , 2015, J. Mach. Learn. Res..

[4]  Wouter M. Koolen,et al.  Second-order Quantile Methods for Experts and Combinatorial Games , 2015, COLT.

[5]  David Haussler,et al.  How to use expert advice , 1993, STOC.

[6]  S. Boucheron,et al.  Theory of classification : a survey of some recent advances , 2005 .

[7]  Vladimir Vovk,et al.  A game of prediction with expert advice , 1995, COLT '95.

[8]  Wouter M. Koolen,et al.  Combining Adversarial Guarantees and Stochastic Fast Rates in Online Learning , 2016, NIPS.

[9]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[10]  Gábor Lugosi,et al.  An Improved Parametrization and Analysis of the EXP3++ Algorithm for Stochastic and Adversarial Bandits , 2017, COLT.

[11]  Peter Auer,et al.  An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits , 2016, COLT.

[12]  Wouter M. Koolen,et al.  Learning the Learning Rate for Prediction with Expert Advice , 2014, NIPS.

[13]  Eric Moulines,et al.  Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.

[14]  Julian Zimmert,et al.  Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously , 2019, ICML.

[15]  Fedor Zhdanov,et al.  Prediction with Expert Advice under Discounted Loss , 2010, ALT.

[16]  Gilles Stoltz,et al.  A second-order bound with excess losses , 2014, COLT.

[17]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[18]  E. Mammen,et al.  Smooth Discrimination Analysis , 1999 .

[19]  P. Massart,et al.  Risk bounds for statistical learning , 2007, math/0702683.

[20]  Yishay Mansour,et al.  Improved second-order bounds for prediction with expert advice , 2006, Machine Learning.

[21]  Olivier Wintenberger,et al.  Optimal learning with Bernstein online aggregation , 2014, Machine Learning.

[22]  A. Tsybakov,et al.  Optimal aggregation of classifiers in statistical learning , 2003 .

[23]  Haipeng Luo,et al.  Achieving All with No Parameters: AdaNormalHedge , 2015, COLT.

[24]  Claudio Gentile,et al.  Adaptive and Self-Confident On-Line Learning Algorithms , 2000, J. Comput. Syst. Sci..

[25]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[26]  Gilles Stoltz,et al.  Fano's inequality for random variables , 2017, Statistical Science.

[27]  P. Bartlett,et al.  Empirical minimization , 2006 .

[28]  Aleksandrs Slivkins,et al.  25th Annual Conference on Learning Theory The Best of Both Worlds: Stochastic and Adversarial Bandits , 2022 .

[29]  Julian Zimmert,et al.  Tsallis-INF: An Optimal Algorithm for Stochastic and Adversarial Bandits , 2018, J. Mach. Learn. Res..

[30]  Wouter M. Koolen,et al.  Follow the leader if you can, hedge if you must , 2013, J. Mach. Learn. Res..

[31]  Elad Hazan,et al.  Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[32]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[33]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[34]  Kazuoki Azuma WEIGHTED SUMS OF CERTAIN DEPENDENT RANDOM VARIABLES , 1967 .

[35]  Aleksandrs Slivkins,et al.  One Practical Algorithm for Both Stochastic and Adversarial Bandits , 2014, ICML.