Derandomizing Stochastic Prediction Strategies

In this paper we continue study of the games of prediction with expert advice with uncountably many experts. A convenient interpretation of such games is to construe the pool of experts as one “stochastic predictor”, who chooses one of the experts in the pool at random according to the prior distribution on the experts and then replicates the (deterministic ) predictions of the chosen expert. We notice that if the stochastic predictor‘s total loss is at most L with probability at least p then the learner‘s loss can be bounded by cL + aln \frac{1}{p} for the usual constants c and a. This interpretation is used to revamp known results and obtain new results on tracking the best expert. It is also applied to merging overconfident experts and to fitting polynomials to data.

[1]  B. C. Carlson Special functions of applied mathematics , 1977 .

[2]  J. Rissanen A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .

[3]  Judea Pearl,et al.  Fusion, Propagation, and Structuring in Belief Networks , 1986, Artif. Intell..

[4]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[5]  Alfredo De Santis,et al.  Learning probabilistic prediction functions , 1988, [Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.

[6]  Vladimir Vovk,et al.  Aggregating strategies , 1990, COLT '90.

[7]  Glenn Shafer,et al.  Readings in Uncertain Reasoning , 1990 .

[8]  Neri Merhav,et al.  Universal prediction of individual sequences , 1992, IEEE Trans. Inf. Theory.

[9]  Vladimir Vovk,et al.  Universal Forecasting Algorithms , 1992, Inf. Comput..

[10]  David Haussler,et al.  How to use expert advice , 1993, STOC.

[11]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[12]  Philip M. Long,et al.  Simulating access to hidden information while learning , 1994, STOC '94.

[13]  David Haussler,et al.  Tight worst-case loss bounds for predicting with expert advice , 1994, EuroCOLT.

[14]  Kenji Yamanishi Randomized approximate aggregating strategies and their applications to prediction and discrimination , 1995, COLT '95.

[15]  Robert E. Schapire,et al.  Predicting Nearly as Well as the Best Pruning of a Decision Tree , 1995, COLT.

[16]  Vladimir Vovk,et al.  A game of prediction with expert advice , 1995, COLT '95.

[17]  Erik Ordentlich,et al.  Universal portfolios with side information , 1996, IEEE Trans. Inf. Theory.

[18]  Yoav Freund,et al.  Predicting a binary sequence almost as well as the optimal biased coin , 2003, COLT '96.

[19]  Nicolò Cesa-Bianchi,et al.  On Bayes Methods for On-Line Boolean Prediction , 1998, COLT '96.

[20]  Vladimir Vovk,et al.  Competitive On-line Linear Regression , 1997, NIPS.

[21]  Yoram Singer,et al.  Using and combining predictors that specialize , 1997, STOC '97.

[22]  Vladimir Vovk,et al.  Universal portfolio selection , 1998, COLT' 98.