Probability theory for the Brier game

The usual theory of prediction with expert advice does not differentiate between good and bad "experts": its typical results only assert that it is possible to efficiently merge not too extensivepools of experts, no matter how good or how bad they are. On the other hand, it is natural toexpect that good experts' predictions will in some way agree with the actual outcomes (e.g., theywill be accurate on the average). In this paper we show that, in the case of the Brier predictiongame (also known as the square-loss game), the predictions of a good (in some weak andnatural sense) expert must satisfy the law of large numbers (both strong and weak) and the lawof the iterated logarithm; we also show that two good experts' predictions must be in asymtoticagreement. To help the reader's intuition, we give a Kolmogorov-complexity interpretation ofour results. Finally, we briefly discuss possible extensions of our results to more general games;the limit theorems for sequences of events in conventional probability theory correspond to thelog-loss game.

[1]  Vladimir Vovk,et al.  Aggregating strategies , 1990, COLT '90.

[2]  Vladimir Vovk,et al.  A game of prediction with expert advice , 1995, COLT '95.

[3]  Manfred K. Warmuth,et al.  Additive versus exponentiated gradient updates for linear prediction , 1995, STOC '95.

[4]  A. Dawid Calibration-Based Empirical Probability , 1985 .

[5]  Donald A. Martin,et al.  An Extension of Borel Determinacy , 1990, Ann. Pure Appl. Log..

[6]  Philip M. Long,et al.  WORST-CASE QUADRATIC LOSS BOUNDS FOR ON-LINE PREDICTION OF LINEAR FUNCTIONS BY GRADIENT DESCENT , 1993 .

[7]  V. Vovk A logic of probability, with application to the foundations of statistics , 1993 .

[8]  David Williams,et al.  Probability with Martingales , 1991, Cambridge mathematical textbooks.

[9]  David Haussler,et al.  Tight worst-case loss bounds for predicting with expert advice , 1994, EuroCOLT.

[10]  P. Billingsley,et al.  Probability and Measure , 1980 .

[11]  William I. Gasarch,et al.  Book Review: An introduction to Kolmogorov Complexity and its Applications Second Edition, 1997 by Ming Li and Paul Vitanyi (Springer (Graduate Text Series)) , 1997, SIGACT News.

[12]  David Haussler,et al.  Sequential Prediction of Individual Sequences Under General Loss Functions , 1998, IEEE Trans. Inf. Theory.

[13]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[14]  Ray J. Solomonoff,et al.  Complexity-based induction systems: Comparisons and convergence theorems , 1978, IEEE Trans. Inf. Theory.

[15]  Erik Ordentlich,et al.  Universal portfolios with side information , 1996, IEEE Trans. Inf. Theory.

[16]  Alfredo De Santis,et al.  Learning probabilistic prediction functions , 1988, [Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.

[17]  S. Delattre,et al.  A central limit theorem for normalized functions of the increments of a diffusion process, in the presence of round-off errors , 1997 .

[18]  Vladimir Vovk,et al.  Competitive On-line Linear Regression , 1997, NIPS.

[19]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[20]  Philip M. Long,et al.  Worst-case quadratic loss bounds for prediction using linear functions and gradient descent , 1996, IEEE Trans. Neural Networks.

[21]  Glenn Shafer,et al.  The art of causal conjecture , 1996 .

[22]  Vladimir Vovk,et al.  Derandomizing Stochastic Prediction Strategies , 1997, COLT '97.