On Prediction of Individual Sequences

Sequential randomized prediction of an arbitrary binary sequence is investigated. No assumption is made on the mechanism of generating the bit sequence. The goal of the predictor is to minimize its relative loss (or regret), i.e., to make almost as few mistakes as the best “expert” in a fixed, possibly infinite, set of experts. We point out a surprising connection between this prediction problem and empirical process theory. First, in the special case of static (memoryless) experts, we completely characterize the minimax regret in terms of the maximum of an associated Rademacher process. Then we show general upper and lower bounds on the minimax regret in terms of the geometry of the class of experts. As main examples, we determine the exact order of magnitude of the minimax regret for the class of autoregressive linear predictors and for the class of Markov experts.

[1]  E. Gilbert A comparison of signalling alphabets , 1952 .

[2]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[3]  Thomas M. Cover,et al.  Behavior of sequential predictors of binary sequences , 1965 .

[4]  Kazuoki Azuma WEIGHTED SUMS OF CERTAIN DEPENDENT RANDOM VARIABLES , 1967 .

[5]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[6]  J. D. T. Oliveira,et al.  The Asymptotic Theory of Extreme Order Statistics , 1979 .

[7]  J. Kuelbs Probability on Banach spaces , 1978 .

[8]  H. Teicher,et al.  Probability theory: Independence, interchangeability, martingales , 1978 .

[9]  P. Hall,et al.  Martingale Limit Theory and Its Application , 1980 .

[10]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[11]  D. Pollard Asymptotics via Empirical Processes , 1989 .

[12]  Vladimir Vovk,et al.  Aggregating strategies , 1990, COLT '90.

[13]  Neri Merhav,et al.  Universal prediction of individual sequences , 1992, IEEE Trans. Inf. Theory.

[14]  David Haussler,et al.  How to use expert advice , 1993, STOC.

[15]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[16]  T. H. Chung Minimax learning in iterated games via distributional majorization , 1994 .

[17]  David Haussler,et al.  Sphere Packing Numbers for Subsets of the Boolean n-Cube with Bounded Vapnik-Chervonenkis Dimension , 1995, J. Comb. Theory, Ser. A.

[18]  Vladimir Vovk,et al.  A game of prediction with expert advice , 1995, COLT '95.

[19]  E. Giné Empirical processes and applications: an overview , 1996 .

[20]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[21]  M. Talagrand Majorizing measures: the generic chaining , 1996 .

[22]  Nicolò Cesa-Bianchi,et al.  Analysis of two gradient-based algorithms for on-line regression , 1997, COLT '97.

[23]  M. Feder,et al.  Universal linear prediction over parameters and model orders , 1998 .

[24]  Neri Merhav,et al.  Universal Prediction , 1998, IEEE Trans. Inf. Theory.

[25]  Gábor Lugosi,et al.  Minimax regret under log loss for general classes of experts , 1999, COLT '99.

[26]  D. Haussler,et al.  Worst Case Prediction over Sequences under Log Loss , 1999 .