On competitive prediction and its relation to rate-distortion theory

Consider the normalized cumulative loss of a predictor F on the sequence x/sup n/=(x/sub 1/,...,x/sub n/), denoted L/sub F/(x/sup n/). For a set of predictors G, let L(G,x/sup n/)=min/sub F/spl isin/G/L/sub F/(x/sup n/) denote the loss of the best predictor in the class on x/sup n/. Given the stochastic process X=X/sub 1/,X/sub 2/,..., we look at EL(G,X/sup n/), termed the competitive predictability of G on X/sup n/. Our interest is in the optimal predictor set of size M, i.e., the predictor set achieving min/sub |G|/spl les/M/EL(G,X/sup n/). When M is subexponential in n, simple arguments show that min/sub |G|/spl les/M/EL(G,X/sup n/) coincides, for large n, with the Bayesian envelope min/sub F/EL/sub F/(X/sup n/). We investigate the behavior, for large n, of min/sub |G|/spl les/e//sup nR/EL(G,X/sup n/), which we term the competitive predictability of X at rate R. We show that whenever X has an autoregressive representation via a predictor with an associated independent and identically distributed (i.i.d.) innovation process, its competitive predictability is given by the distortion-rate function of that innovation process. Indeed, it will be argued that by viewing G as a rate-distortion codebook and the predictors in it as codewords allowed to base the reconstruction of each symbol on the past unquantized symbols, the result can be considered as the source-coding analog of Shannon's classical result that feedback does not increase the capacity of a memoryless channel. For a general process X, we show that the competitive predictability is lower-bounded by the Shannon lower bound (SLB) on the distortion-rate function of X and upper-bounded by the distortion-rate function of any (not necessarily memoryless) innovation process through which the process X has an autoregressive representation. Thus, the competitive predictability is also precisely characterized whenever X can be autoregressively represented via an innovation process for which the SLB is tight. The error exponent, i.e., the exponential behavior of min/sub |G|/spl les/exp(nR)/Pr(L(G,X/sup n/)>d), is also characterized for processes that can be autoregressively represented with an i.i.d. innovation process.

[1]  Paul H. Algoet,et al.  The strong law of large numbers for sequential decisions under uncertainty , 1994, IEEE Trans. Inf. Theory.

[2]  Neri Merhav,et al.  Universal prediction of individual sequences , 1992, IEEE Trans. Inf. Theory.

[3]  Te Sun Han An information-spectrum approach to source coding theorems with a fidelity criterion , 1997, IEEE Trans. Inf. Theory.

[4]  Jorma Rissanen,et al.  The Minimum Description Length Principle in Coding and Modeling , 1998, IEEE Trans. Inf. Theory.

[5]  Te Sun Han An information-spectrum approach to source coding theorems with a fidelity criterion , 1997, Proceedings of IEEE International Symposium on Information Theory.

[6]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[7]  Nicolò Cesa-Bianchi,et al.  Potential-Based Algorithms in On-Line Prediction and Game Theory , 2003, Machine Learning.

[8]  Gábor Lugosi,et al.  Minimax regret under log loss for general classes of experts , 1999, COLT '99.

[9]  Tsachy Weissman,et al.  Universal prediction of individual binary sequences in the presence of noise , 2001, IEEE Trans. Inf. Theory.

[10]  David Haussler,et al.  How to use expert advice , 1993, STOC.

[11]  Toby Berger,et al.  Rate distortion theory : a mathematical basis for data compression , 1971 .

[12]  R. Gallager Information Theory and Reliable Communication , 1968 .

[13]  N. Merhav,et al.  Scanning and prediction in multi-dimensional data arrays , 2002, Proceedings IEEE International Symposium on Information Theory,.

[14]  Nicolò Cesa-Bianchi,et al.  On sequential prediction of individual sequences relative to a set of experts , 1998, COLT' 98.

[15]  Vladimir Vovk,et al.  Aggregating strategies , 1990, COLT '90.

[16]  Jorma Rissanen,et al.  Universal coding, information, prediction, and estimation , 1984, IEEE Trans. Inf. Theory.

[17]  Anatolii A. Puhalskii,et al.  On large-deviation efficiency in statistical inference , 1998 .

[18]  Kenneth Rose,et al.  A mapping approach to rate-distortion computation and analysis , 1994, IEEE Trans. Inf. Theory.

[19]  Vladimir Vovk,et al.  A game of prediction with expert advice , 1995, COLT '95.

[20]  Neri Merhav,et al.  Optimal sequential probability assignment for individual sequences , 1994, IEEE Trans. Inf. Theory.

[21]  Neri Merhav,et al.  Universal Prediction , 1998, IEEE Trans. Inf. Theory.

[22]  Toby Berger,et al.  Lossy Source Coding , 1998, IEEE Trans. Inf. Theory.

[23]  Neri Merhav,et al.  Guessing Subject to Distortion , 1998, IEEE Trans. Inf. Theory.

[24]  G. Lugosi,et al.  On Prediction of Individual Sequences , 1998 .

[25]  Katalin Marton,et al.  Error exponent for source coding with a fidelity criterion , 1974, IEEE Trans. Inf. Theory.

[26]  Claude E. Shannon,et al.  The zero error capacity of a noisy channel , 1956, IRE Trans. Inf. Theory.

[27]  Paul H. Algoet,et al.  Universal Schemes for Learning the Best Nonlinear Predictor Given the Infinite Past and Side Information , 1999, IEEE Trans. Inf. Theory.

[28]  David Haussler,et al.  Sequential Prediction of Individual Sequences Under General Loss Functions , 1998, IEEE Trans. Inf. Theory.