On sequential prediction of individual sequences relative to a set of experts

We investigate sequential randomized prediction of an arbitrary sequence taking values from a binary alphabet. The goal of the predictor is to minimize his Hamming loss relative to the loss of the best “expert” in a fixed, possibly infinite, set of experts. We point out a surprisingly close connection between the prediction problem and empirical process theory. Using ideas and results from empirical process theory, we show upper and lower bounds on the minimax relative loss in terms of the geometry of the class of experts. As a main example, we determine the exact order of magnitude of the minimax relative loss for the class of Markov experts. Furthermore, in the special case of static experts, we completely characterize the minimax relative loss in terms of the maximal deviation of an associated Rademacher process.

[1]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[2]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[3]  Thomas M. Cover,et al.  Behavior of sequential predictors of binary sequences , 1965 .

[4]  S. Szarek On the best constants in the Khinchin inequality , 1976 .

[5]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[6]  Nicolò Cesa-Bianchi,et al.  Analysis of two gradient-based algorithms for on-line regression , 1997, COLT '97.

[7]  Michel Loève,et al.  Probability Theory I , 1977 .

[8]  T. H. Chung Minimax learning in iterated games via distributional majorization , 1994 .

[9]  Neri Merhav,et al.  Universal prediction of individual sequences , 1992, IEEE Trans. Inf. Theory.

[10]  J. Hoffmann-jorgensen Probability in Banach Space , 1977 .

[11]  D. Haussler,et al.  Worst Case Prediction over Sequences under Log Loss , 1999 .

[12]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[13]  Kazuoki Azuma WEIGHTED SUMS OF CERTAIN DEPENDENT RANDOM VARIABLES , 1967 .

[14]  D. Pollard Asymptotics via Empirical Processes , 1989 .

[15]  E. Gilbert A comparison of signalling alphabets , 1952 .

[16]  Vladimir Vovk,et al.  Aggregating strategies , 1990, COLT '90.

[17]  M. Talagrand Majorizing measures: the generic chaining , 1996 .

[18]  Nicolò Cesa-Bianchi,et al.  Analysis of Two Gradient-Based Algorithms for On-Line Regression , 1999 .