How to Better Use Expert Advice

AbstractThis work is concerned with online learning from expert advice. Extensive work on this problem generated numerous “expert advice algorithms” whose total loss is provably bounded above in terms of the loss incurred by the best expert in hindsight. Such algorithms were devised for various problem variants corresponding to various loss functions. For some loss functions, such as the square, Hellinger and entropy losses, optimal algorithms are known. However, for two of the most widely used loss functions, namely the 0/1 and absolute loss, there are still gaps between the known lower and upper bounds.In this paper we present two new expert advice algorithms and prove for them the best known 0/1 and absolute loss bounds. Given an expert advice algorithm ALG, the goal is to form an upper bound on the regretLALG – L* of ALG, where LALG is the loss of ALG and L* is the loss of the best expert in hindsight. Typically, regret bounds of a “canonical form” C · $$\sqrt {L^ * \ln N} $$ are sought where N is the number of experts and C is a constant. So far, the best known constant for the absolute loss function is C = 2.83, which is achieved by the recent IAWM algorithm of Auer et al. (2002). For the 0/1 loss function no bounds of this canonical form are known and the best known regret bound is $$L_{ALG} - L* \leqslant L* + C_1 \ln N + C_2 \sqrt {L*\ln N + \frac{e}{4}\ln ^2 N} $$ , where C1 = e − 2 and C2 = 2 $$\sqrt e $$ . This bound is achieved by a “P-norm” algorithm of Gentile and Littlestone (1999). Our first algorithm is a randomized extension of the “guess and double” algorithm of Cesa-Bianchi et al. (1997). While the guess and double algorithm achieves a canonical regret bound with C = 3.32, the expected regret of our randomized algorithm is canonically bounded with C = 2.49 for the absolute loss function. The algorithm utilizes one random choice at the start of the game. Like the deterministic guess and double algorithm, a deficiency of our algorithm is that it occasionally restarts itself and therefore “forgets” what it learned. Our second algorithm does not forget and enjoys the best known asymptotic performance guarantees for both the absolute and 0/1 loss functions. Specifically, in the case of the absolute loss, our algorithm is canonically bounded with C approaching $$\sqrt 2 $$ and in the case of the 0/1 loss, with C approaching 3/ $$\sqrt 2 \approx 2.12$$ . In the 0/1 loss case the algorithm is randomized and the bound is on the expected regret.

[1]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[2]  Avrim Blum,et al.  On-line Algorithms in Machine Learning , 1996, Online Algorithms.

[3]  D. Blackwell An analog of the minimax theorem for vector payoffs. , 1956 .

[4]  Vladimir Vovk,et al.  Aggregating strategies , 1990, COLT '90.

[5]  GentileClaudio The Robustness of the p-Norm Algorithms , 2003 .

[6]  David Haussler,et al.  How to use expert advice , 1993, STOC.

[7]  David Haussler,et al.  Sequential Prediction of Individual Sequences Under General Loss Functions , 1998, IEEE Trans. Inf. Theory.

[8]  H. Chernoff Rational Selection of Decision Functions , 1954 .

[9]  Christos H. Papadimitriou,et al.  Games against nature , 1985, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[10]  Darrell D. E. Long,et al.  A dynamic disk spin-down technique for mobile computing , 1996, MobiCom '96.

[11]  Avrim Blum,et al.  On-line Learning and the Metrical Task System Problem , 1997, COLT '97.

[12]  Claudio Gentile,et al.  The Robustness of the p-Norm Algorithms , 1999, COLT '99.

[13]  Robert E. Tarjan,et al.  Amortized efficiency of list update and paging rules , 1985, CACM.

[14]  Neri Merhav,et al.  Universal prediction of individual sequences , 1992, IEEE Trans. Inf. Theory.

[15]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[16]  Adam Tauman Kalai,et al.  Finely-competitive paging , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[17]  Allan Borodin,et al.  Online computation and competitive analysis , 1998 .

[18]  T. Cover Universal Portfolios , 1996 .

[19]  Avrim Blum,et al.  Empirical Support for Winnow and Weighted-Majority Algorithms: Results on a Calendar Scheduling Domain , 2004, Machine Learning.

[20]  Robert E. Schapire,et al.  Predicting Nearly As Well As the Best Pruning of a Decision Tree , 1995, COLT '95.

[21]  Alfredo De Santis,et al.  Learning probabilistic prediction functions , 1988, [Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.

[22]  Claudio Gentile,et al.  Adaptive and Self-Confident On-Line Learning Algorithms , 2000, J. Comput. Syst. Sci..