Predicting a binary sequence almost as well as the optimal biased coin

We apply the exponential weight algorithm, introduced and Littlestone and Warmuth [26]and by Vovk [35]to the problem of predicting a binary sequence almost as well as the best biased coin. We first show that for the case of the logarithmic loss, the derived algorithm is equivalent to the Bayes algorithm with Jeffreys prior, that was studied by Xie and Barron [38]under probabilistic assumptions. We derive a uniform bound on the regret which holds for any sequence. We also show that if the empirical distribution of the sequence is bounded away from 0 and from 1, then, as the length of the sequence increases to infinity, the difference between this bound and a corresponding bound on the average case regret of the same algorithm (which is asymptotically optimal in that case) is only 1/2. We show that this gap of 1/2 is necessary by calculating the regret of the min–max optimal algorithm for this problem and showing that the asymptotic upper bound is tight. We also study the application of this algorithm to the square loss and show that the algorithm that is derived in this case is different from the Bayes algorithm and is better than it for prediction in the worstcase.

[1]  M. A. Girshick,et al.  Theory of games and statistical decisions , 1955 .

[2]  D. Blackwell An analog of the minimax theorem for vector payoffs. , 1956 .

[3]  N. D. Bruijn Asymptotic methods in analysis , 1958 .

[4]  James Hannan,et al.  4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .

[5]  Jaroslav Kožešnk,et al.  Information Theory, Statistical Decision Functions, Random Processes , 1962 .

[6]  Thomas M. Cover,et al.  Behavior of sequential predictors of binary sequences , 1965 .

[7]  J. Florentin,et al.  Handbook of Mathematical Functions. , 1966 .

[8]  Irene A. Stegun,et al.  Handbook of Mathematical Functions. , 1966 .

[9]  Gerald S. Rogers,et al.  Mathematical Statistics: A Decision Theoretic Approach , 1967 .

[10]  Lee D. Davisson,et al.  Universal noiseless coding , 1973, IEEE Trans. Inf. Theory.

[11]  J. Bernardo Reference Posterior Distributions for Bayesian Inference , 1979 .

[12]  Glen G. Langdon,et al.  Universal modeling and coding , 1981, IEEE Trans. Inf. Theory.

[13]  Rajesh Sharma,et al.  Asymptotic analysis , 1986 .

[14]  Nick Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm (Extended Abstract) , 1987, FOCS.

[15]  Vladimir Vovk,et al.  Aggregating strategies , 1990, Annual Conference Computational Learning Theory.

[16]  Dean Phillips Foster Prediction in the Worst Case , 1991 .

[17]  A. Barron,et al.  Jeffreys' prior is asymptotically least favorable under entropy risk , 1994 .

[18]  Neri Merhav,et al.  Universal prediction of individual sequences , 1992, IEEE Trans. Inf. Theory.

[19]  David Haussler,et al.  How to use expert advice , 1993, STOC '93.

[20]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 1993, Texts and Monographs in Computer Science.

[21]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[22]  J. Hartigan,et al.  Discrete noninformative priors , 1994 .

[23]  David Haussler,et al.  Tight worst-case loss bounds for predicting with expert advice , 1994, EuroCOLT.

[24]  Frans M. J. Willems,et al.  The context-tree weighting method: basic properties , 1995, IEEE Trans. Inf. Theory.

[25]  J. Suzuki Some Notes on Universal Noiseless Coding , 1995 .

[26]  Vladimir Vovk,et al.  A game of prediction with expert advice , 1995, COLT '95.

[27]  Erik Ordentlich,et al.  Universal portfolios with side information , 1996, IEEE Trans. Inf. Theory.

[28]  T. Cover Universal Portfolios , 1996 .

[29]  Jorma Rissanen,et al.  Fisher information and stochastic complexity , 1996, IEEE Trans. Inf. Theory.

[30]  Manfred K. Warmuth,et al.  How to use expert advice , 1997, JACM.

[31]  S. Hart,et al.  A Simple Adaptive Procedure Leading to Correlated Equilibrium , 1997 .

[32]  David Haussler A general minimax result for relative entropy , 1997, IEEE Trans. Inf. Theory.

[33]  William I. Gasarch,et al.  Book Review: An introduction to Kolmogorov Complexity and its Applications Second Edition, 1997 by Ming Li and Paul Vitanyi (Springer (Graduate Text Series)) , 1997, SIGACT News.

[34]  Paul M. B. Vitányi,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 1997, Graduate Texts in Computer Science.

[35]  Andrew R. Barron,et al.  Minimax redundancy for the class of memoryless sources , 1997, IEEE Trans. Inf. Theory.

[36]  Kenji Yamanishi,et al.  A Decision-Theoretic Extension of Stochastic Complexity and Its Applications to Learning , 1998, IEEE Trans. Inf. Theory.

[37]  Jorma Rissanen,et al.  The Minimum Description Length Principle in Coding and Modeling , 1998, IEEE Trans. Inf. Theory.

[38]  Jorma Rissanen,et al.  Stochastic Complexity in Statistical Inquiry , 1989, World Scientific Series in Computer Science.

[39]  G. Lugosi,et al.  On Prediction of Individual Sequences , 1998 .