Near Minimax Optimal Players for the Finite-Time 3-Expert Prediction Problem

We study minimax strategies for the online prediction problem with expert advice. It has been conjectured that a simple adversary strategy, called COMB, is near optimal in this game for any number of experts. Our results and new insights make progress in this direction by showing that, up to a small additive term, COMB is minimax optimal in the finite-time three expert problem. In addition, we provide for this setting a new near minimax optimal COMB-based learner. Prior to this work, in this problem, learners obtaining the optimal multiplicative constant in their regret rate were known only when K = 2 or K → ∞. We characterize, when K = 3, the regret of the game scaling as √8/(9π)T ± log(T)2 which gives for the first time the optimal constant in the leading (√T) term of the regret.

[1]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1951 .

[2]  James Hannan,et al.  4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .

[3]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[4]  Thomas M. Cover,et al.  Behavior of sequential predictors of binary sequences , 1965 .

[5]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[6]  Vladimir Vovk,et al.  A game of prediction with expert advice , 1995, COLT '95.

[7]  Manfred K. Warmuth,et al.  How to use expert advice , 1997, JACM.

[8]  Charles M. Grinstead,et al.  Introduction to probability , 1999, Statistics for the Behavioural Sciences.

[9]  P. Stănică GOOD LOWER AND UPPER BOUNDS ON BINOMIAL COEFFICIENTS , 2001 .

[10]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[11]  Remco van der Hofstad,et al.  An Elementary Proof of the Hitting Time Theorem , 2008, Am. Math. Mon..

[12]  Manfred K. Warmuth,et al.  Optimal strategies from random walks , 2008, COLT 2008.

[13]  Manfred K. Warmuth,et al.  Repeated Games against Budgeted Adversaries , 2010, NIPS.

[14]  Haipeng Luo,et al.  Towards Minimax Online Learning with Unknown Time Horizon , 2013, ICML.

[15]  Francesco Orabona,et al.  Optimal Non-Asymptotic Lower Bound on the Minimax Regret of Learning with Expert Advice , 2015, ArXiv.

[16]  Yuval Peres,et al.  Towards Optimal Algorithms for Prediction with Expert Advice , 2014, SODA.

[17]  Yuval Peres,et al.  Tight Lower Bounds for Multiplicative Weights Algorithmic Families , 2016, ICALP.