Universal prediction of individual binary sequences in the presence of noise

The problem of predicting the next outcome of an individual binary sequence, based on noisy observations of the past, is considered. The goal of the predictor is to perform, for each individual sequence, "almost" as well as the best in a set of experts, where performance is evaluated using a general loss function. A comprehensive approach to prediction in this noisy setting is presented and proven generally efficient under appropriate conditions. As an illustration of the applicability of the approach suggested for concrete situations, two important special cases are explicitly treated. The first is the case where the data-corrupting noise process is binary-valued (where the observed bit is the bitwise XOR of the clean bit and the noise bit). The second case is that of real-valued additive noise. It is shown that even in this more challenging situation, where the information available to the predictor regarding the past sequence is incomplete, a predictor can be guaranteed to successfully compete with a whole set of experts in considerably strong senses.

[1]  E. Rio The Functional Law of the Iterated Logarithm for Stationary Strongly Mixing Sequences , 1995 .

[2]  Neri Merhav,et al.  A strong version of the redundancy-capacity theorem of universal coding , 1995, IEEE Trans. Inf. Theory.

[3]  Liming Wu Large Deviations, Moderate Deviations and LIL for Empirical Processes , 1994 .

[4]  W. Stout The Hartman-Wintner Law of the Iterated Logarithm for Martingales , 1970 .

[5]  H. Robbins,et al.  Asymptotic Solutions of the Compound Decision Problem for Two Completely Specified Distributions , 1955 .

[6]  Lee D. Davisson,et al.  Universal noiseless coding , 1973, IEEE Trans. Inf. Theory.

[7]  Jorma Rissanen,et al.  Fisher information and stochastic complexity , 1996, IEEE Trans. Inf. Theory.

[8]  N. Merhav,et al.  Universal filtering of individual sequences corrupted by noise , 1998, Proceedings. 1998 IEEE International Symposium on Information Theory (Cat. No.98CH36252).

[9]  D. Pollard Asymptotics via Empirical Processes , 1989 .

[10]  A. Dawid,et al.  Prequential probability: principles and properties , 1999 .

[11]  N. Merhav,et al.  Universal prediction of individual binary sequences in the presence of arbitrarily varying, memoryless additive noise , 2000, 2000 IEEE International Symposium on Information Theory (Cat. No.00CH37060).

[12]  Tsachy Weissman,et al.  On prediction of individual sequences relative to a set of experts in the presence of noise , 1999, COLT '99.

[13]  H. Robbins Asymptotically Subminimax Solutions of Compound Statistical Decision Problems , 1985 .

[14]  R. Durrett Probability: Theory and Examples , 1993 .

[15]  Vladimir Vovk,et al.  Aggregating strategies , 1990, COLT '90.

[16]  Vladimir Vovk,et al.  A game of prediction with expert advice , 1995, COLT '95.

[17]  M. Talagrand,et al.  Probability in Banach spaces , 1991 .

[18]  Nicolò Cesa-Bianchi,et al.  On sequential prediction of individual sequences relative to a set of experts , 1998, COLT' 98.

[19]  Wu Liming,et al.  Moderate Deviations of Dependent Random Variables Related to CLT , 1995 .

[20]  Jorma Rissanen,et al.  Universal coding, information, prediction, and estimation , 1984, IEEE Trans. Inf. Theory.

[21]  Tsachy Weissman,et al.  On limited-delay lossy coding and filtering of individual sequences , 2002, IEEE Trans. Inf. Theory.

[22]  Neri Merhav,et al.  Optimal sequential probability assignment for individual sequences , 1994, IEEE Trans. Inf. Theory.

[23]  Neri Merhav,et al.  Universal Prediction , 1998, IEEE Trans. Inf. Theory.

[24]  D. Haussler,et al.  Worst Case Prediction over Sequences under Log Loss , 1999 .

[25]  J. V. Ryzin,et al.  The Sequential Compound Decision Problem with $m \times n$ Finite Loss Matrix , 1966 .

[26]  V. Peña A General Class of Exponential Inequalities for Martingales and Ratios , 1999 .

[27]  S. Vardeman Admissible solutions of k-extended finite state set and sequence compound decision problems , 1980 .

[28]  Tamás Linder,et al.  A zero-delay sequential scheme for lossy coding of individual sequences , 2001, IEEE Trans. Inf. Theory.

[29]  E. Samuel Asymptotic Solutions of the Sequential Compound Decision Problem , 1963 .

[30]  David Haussler,et al.  Sequential Prediction of Individual Sequences Under General Loss Functions , 1998, IEEE Trans. Inf. Theory.

[31]  A. P. Dawid,et al.  Prequential data analysis , 1992 .

[32]  Y. Nogami Thek-extended set-compound estimation problem in a nonregular family of distrubutions over [θ, θ+1) , 1979 .

[33]  David Haussler,et al.  How to use expert advice , 1993, STOC.

[34]  E. Samuel Convergence of the Losses of Certain Decision Rules for the Sequential Compound Decision Problem , 1964 .

[35]  P. Hall,et al.  Martingale Limit Theory and Its Application , 1980 .

[36]  Neri Merhav,et al.  Universal prediction of individual sequences , 1992, IEEE Trans. Inf. Theory.

[37]  Neri Merhav,et al.  Universal coding with minimum probability of codeword length overflow , 1991, IEEE Trans. Inf. Theory.

[38]  W. Stout A martingale analogue of Kolmogorov's law of the iterated logarithm , 1970 .

[39]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[40]  T. Cover Universal Portfolios , 1996 .

[41]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[42]  M. Talagrand Sharper Bounds for Gaussian and Empirical Processes , 1994 .

[43]  Neri Merhav,et al.  Hierarchical universal coding , 1996, IEEE Trans. Inf. Theory.

[44]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[45]  N. Fisher,et al.  Probability Inequalities for Sums of Bounded Random Variables , 1994 .

[46]  J. V. Ryzin,et al.  The Compound Decision Problem with $m \times n$ Finite Loss Matrix , 1966 .