The problem of sequential probability forecasting is considered in the most general setting: a model set C is given, and it is required to predict as well as possible if any of the measures (environments) in C is chosen to generate the data. No assumptions whatsoever are made on the model class C, in particular, no independence or mixing assumptions; C may not be measurable; there may be no predictor whose loss is sublinear, etc. It is shown that the cumulative loss of any possible predictor can be matched by that of a Bayesian predictor whose prior is discrete and is concentrated on C, up to an additive term of order $\log n$, where $n$ is the time step. The bound holds for every $n$ and every measure in C. This is the first non-asymptotic result of this kind. In addition, a non-matching lower bound is established: it goes to infinity with $n$ but may do so arbitrarily slow.
[1]
Daniil Ryabko,et al.
Things Bayes Can't Do
,
2016,
ALT.
[2]
Robert M. Gray,et al.
Probability, Random Processes, And Ergodic Properties
,
1987
.
[3]
Daniil Ryabko.
On the Relation between Realizable and Nonrealizable Cases of the Sequence Prediction Problem
,
2011,
J. Mach. Learn. Res..
[4]
Daniil Ryabko,et al.
On Finding Predictors for Arbitrary Families of Processes
,
2009,
J. Mach. Learn. Res..
[5]
D. Freedman,et al.
On the consistency of Bayes estimates
,
1986
.
[6]
Daniil Ryabko,et al.
Universality of Bayesian mixture predictors
,
2016,
ALT.
[7]
D. Freedman.
On the Asymptotic Behavior of Bayes' Estimates in the Discrete Case
,
1963
.
[8]
Gábor Lugosi,et al.
Prediction, learning, and games
,
2006
.