Worst-Case Bounds for the Logarithmic Loss of Predictors

We investigate on-line prediction of individual sequences. Given a class of predictors, the goal is to predict as well as the best predictor in the class, where the loss is measured by the self information (logarithmic) loss function. The excess loss (regret) is closely related to the redundancy of the associated lossless universal code. Using Shtarkov's theorem and tools from empirical process theory, we prove a general upper bound on the best possible (minimax) regret. The bound depends on certain metric properties of the class of predictors. We apply the bound to both parametric and nonparametric classes of predictors. Finally, we point out a suboptimal behavior of the popular Bayesian weighted average algorithm.

[1]  Jorma Rissanen,et al.  Generalized Kraft Inequality and Arithmetic Coding , 1976, IBM J. Res. Dev..

[2]  M. Talagrand Majorizing measures: the generic chaining , 1996 .

[3]  Kenji Yamanishi,et al.  A Decision-Theoretic Extension of Stochastic Complexity and Its Applications to Learning , 1998, IEEE Trans. Inf. Theory.

[4]  T. Cover Universal Portfolios , 1996 .

[5]  Kazuoki Azuma WEIGHTED SUMS OF CERTAIN DEPENDENT RANDOM VARIABLES , 1967 .

[6]  Jorma Rissanen,et al.  Fisher information and stochastic complexity , 1996, IEEE Trans. Inf. Theory.

[7]  Yoav Freund,et al.  Predicting a binary sequence almost as well as the optimal biased coin , 2003, COLT '96.

[8]  Andrew R. Barron,et al.  Asymptotic minimax regret for data compression, gambling, and prediction , 1997, IEEE Trans. Inf. Theory.

[9]  David Haussler,et al.  Sequential Prediction of Individual Sequences Under General Loss Functions , 1998, IEEE Trans. Inf. Theory.

[10]  Dianne P. O'Leary,et al.  The mathematics of information coding, extraction, and distribution , 1999 .

[11]  Neri Merhav,et al.  Optimal sequential probability assignment for individual sequences , 1994, IEEE Trans. Inf. Theory.

[12]  Neri Merhav,et al.  Universal Prediction , 1998, IEEE Trans. Inf. Theory.

[13]  Meir Feder,et al.  Gambling using a finite state machine , 1991, IEEE Trans. Inf. Theory.

[14]  D. Haussler,et al.  Worst Case Prediction over Sequences under Log Loss , 1999 .

[15]  A. Barron,et al.  Asymptotic minimax regret for data compression, gambling and prediction , 1997, Proceedings of IEEE International Symposium on Information Theory.

[16]  Vladimir Vovk,et al.  Aggregating strategies , 1990, COLT '90.

[17]  Vladimir Vovk,et al.  A game of prediction with expert advice , 1995, COLT '95.

[18]  Kenji Yamanishi,et al.  A Loss Bound Model for On-Line Stochastic Prediction Algorithms , 1995, Inf. Comput..

[19]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[20]  Alfredo De Santis,et al.  Learning probabilistic prediction functions , 1988, [Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.

[21]  Erik Ordentlich,et al.  Universal portfolios with side information , 1996, IEEE Trans. Inf. Theory.