Improved second-order bounds for prediction with expert advice

This work studies external regret in sequential prediction games with both positive and negative payoffs. External regret measures the difference between the payoff obtained by the forecasting strategy and the payoff of the best action. In this setting, we derive new and sharper regret bounds for the well-known exponentially weighted average forecaster and for a second forecaster with a different multiplicative update rule. Our analysis has two main advantages: first, no preliminary knowledge about the payoff sequence is needed, not even its range; second, our bounds are expressed in terms of sums of squared payoffs, replacing larger first-order quantities appearing in previous bounds. In addition, our most refined bounds have the natural and desirable property of being stable under rescalings and general translations of the payoff sequence.

[1]  D. Freedman On Tail Probabilities for Martingales , 1975 .

[2]  David Haussler,et al.  How to use expert advice , 1993, STOC.

[3]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[4]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[5]  Vladimir Vovk,et al.  A game of prediction with expert advice , 1995, COLT '95.

[6]  Erik Ordentlich,et al.  On-line portfolio selection , 1996, COLT '96.

[7]  Yoram Singer,et al.  On‐Line Portfolio Selection Using Multiplicative Updates , 1998, ICML.

[8]  S. Hart,et al.  A Reinforcement Procedure Leading to Correlated Equilibrium , 2001 .

[9]  Christian Schindelhauer,et al.  Discrete Prediction Games with Arbitrary Feedback and Loss , 2001, COLT/EuroCOLT.

[10]  Claudio Gentile,et al.  Adaptive and Self-Confident On-Line Learning Algorithms , 2000, J. Comput. Syst. Sci..

[11]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[12]  Gábor Lugosi,et al.  Minimizing Regret with Label Efficient Prediction , 2004, COLT.

[13]  Nicolò Cesa-Bianchi,et al.  Potential-Based Algorithms in On-Line Prediction and Game Theory , 2003, Machine Learning.

[14]  Chamy Allenberg-Neeman,et al.  Full Information Game with Gains and Losses , 2004, ALT.

[15]  Nicolò Cesa-Bianchi,et al.  Regret Minimization Under Partial Monitoring , 2006, ITW.

[16]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .