On the optimality of symbol-by-symbol filtering and denoising

We consider the problem of optimally recovering a finite-alphabet discrete-time stochastic process {X/sub t/} from its noise-corrupted observation process {Z/sub t/}. In general, the optimal estimate of X/sub t/ will depend on all the components of {Z/sub t/} on which it can be based. We characterize nontrivial situations (i.e., beyond the case where (X/sub t/,Z/sub t/) are independent) for which optimum performance is attained using "symbol-by-symbol" operations (a.k.a. "singlet decoding"), meaning that the optimum estimate of X/sub t/ depends solely on Z/sub t/. For the case where {X/sub t/} is a stationary binary Markov process corrupted by a memoryless channel, we characterize the necessary and sufficient condition for optimality of symbol-by-symbol operations, both for the filtering problem (where the estimate of X/sub t/ is allowed to depend only on {Z/sub t'/}/sub t'/spl les/t/) and the denoising problem (where the estimate of X/sub t/ is allowed dependence on the entire noisy process). It is then illustrated how our approach, which consists of characterizing the support of the conditional distribution of the noise-free symbol given the observations, can be used for characterizing the entropy rate of the binary Markov process corrupted by the binary-symmetric channel (BSC) in various asymptotic regimes. For general noise-free processes (not necessarily Markov), general noise processes (not necessarily memoryless), and general index sets (random fields) we obtain an easily verifiable sufficient condition for the optimality of symbol-by-symbol operations and illustrate its use in a few special cases. For example, for binary processes corrupted by a BSC, we establish, under mild conditions, the existence of a /spl delta//sup */>0 such that the "say-what-you-see" scheme is optimal provided the channel crossover probability is less than /spl delta//sup */. Finally, we show how for the case of a memoryless channel the large deviations (LD) performance of a symbol-by-symbol filter is easy to obtain, thus characterizing the LD behavior of the optimal schemes when these are singlet decoders (and constituting the only known cases where such explicit characterization is available).

[1]  Neri Merhav,et al.  Hidden Markov processes , 2002, IEEE Trans. Inf. Theory.

[2]  Tsachy Weissman Universally Attainable Error-Exponents for Rate-Constrained Denoising of Noisy Sources , 2002 .

[3]  Ofer Zeitouni,et al.  Asymptotic filtering for finite state Markov chains , 1996 .

[4]  L. Arnold,et al.  Evolutionary Formalism for Products of Positive Random Matrices , 1994 .

[5]  Brian D. O. Anderson,et al.  On state-estimation of a two-state hidden Markov model with quantization , 2001, IEEE Trans. Signal Process..

[6]  R. Atar,et al.  Exponential stability for nonlinear filtering , 1997 .

[7]  Michael Gastpar,et al.  To code, or not to code: lossy source-channel communication revisited , 2003, IEEE Trans. Inf. Theory.

[8]  Robert M. Gray,et al.  Information rates of autoregressive processes , 1970, IEEE Trans. Inf. Theory.

[9]  Daniel Sagalowicz Hypothesis testing with finite memory , 1970 .

[10]  Tsachy Weissman,et al.  On competitive prediction and its relation to rate-distortion theory , 2003, IEEE Trans. Inf. Theory.

[11]  Bruce Hajek,et al.  A Decomposition Theorem for Binary Markov Random Fields , 1987 .

[12]  Neri Merhav,et al.  Source coding exponents for zero-delay coding with finite memory , 2003, IEEE Trans. Inf. Theory.

[13]  Tsachy Weissman,et al.  On the optimality of symbol by symbol filtering and denoising , 2004, ISIT.

[14]  Amir Dembo,et al.  Large Deviations Techniques and Applications , 1998 .

[15]  Fady Alajaji,et al.  Detection of binary Markov sources over channels with additive Markov noise , 1996, IEEE Trans. Inf. Theory.

[16]  E. Samuel An Empirical Bayes Approach to the Testing of Certain Parametric Hypotheses , 1963 .

[17]  Xavier Guyon,et al.  Random fields on a network , 1995 .

[18]  J. L. Devore A note on the observation of a Markov source through a noisy channel (Corresp.) , 1974, IEEE Trans. Inf. Theory.

[19]  James Hannan,et al.  4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .

[20]  N. Phamdo,et al.  Optimal Detection of Discrete Markov Sources Over Discrete Memoryless Channels - Applications to Combined Source-Channel Coding , 1993, Proceedings. IEEE International Symposium on Information Theory.

[21]  Tsachy Weissman,et al.  Universal denoising for the finite-input general-output channel , 2005, IEEE Transactions on Information Theory.

[22]  R. Gray Rate distortion functions for finite-state finite-alphabet Markov sources , 1971, IEEE Trans. Inf. Theory.

[23]  H. Kunita Asymptotic behavior of the nonlinear filtering errors of Markov processes , 1971 .

[24]  A. Dembo,et al.  The asymptotics of waiting times between stationary processes , 1999 .

[25]  Nam C. Phamdo,et al.  Optimal detection of discrete Markov sources over discrete memoryless channels - applications to combined source-channel coding , 1994, IEEE Trans. Inf. Theory.

[26]  Israel Bar-David,et al.  Capacity and coding for the Gilbert-Elliot channels , 1989, IEEE Trans. Inf. Theory.

[27]  Neri Merhav,et al.  Universal Prediction , 1998, IEEE Trans. Inf. Theory.

[28]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[29]  Hans-Otto Georgii,et al.  Gibbs Measures and Phase Transitions , 1988 .

[30]  Yuval Peres,et al.  Analytic dependence of Lyapunov exponents on transition probabilities , 1991 .

[31]  Tsachy Weissman,et al.  Universally attainable error exponents for rate-distortion coding of noisy sources , 2003, IEEE Transactions on Information Theory.

[32]  David L. Neuhoff,et al.  Causal source codes , 1982, IEEE Trans. Inf. Theory.

[33]  Philippe Jacquet,et al.  On the entropy of a hidden Markov process , 2004, Data Compression Conference, 2004. Proceedings. DCC 2004.

[34]  Tsachy Weissman,et al.  New bounds on the entropy rate of hidden Markov processes , 2004, Information Theory Workshop.

[35]  Tsachy Weissman,et al.  Universal discrete denoising , 2002, Proceedings of the IEEE Information Theory Workshop.

[36]  Amarjit Budhiraja,et al.  Markov Property and Ergodicity of the Nonlinear Filter , 2000, SIAM J. Control. Optim..

[37]  Tsachy Weissman,et al.  Universal discrete denoising: known channel , 2003, IEEE Transactions on Information Theory.

[38]  J. M. Bilbao,et al.  Contributions to the Theory of Games , 2005 .

[39]  Tsachy Weissman,et al.  Tradeoffs between the excess-code-length exponent and the excess-distortion exponent in lossy source coding , 2002, IEEE Trans. Inf. Theory.

[40]  P. Glynn,et al.  On Entropy and Lyapunov Exponents for Finite-State Channels , 2003 .

[41]  Pravin Varaiya,et al.  Capacity, mutual information, and coding for finite-state Markov channels , 1996, IEEE Trans. Inf. Theory.

[42]  Toby Berger,et al.  Rate distortion theory : a mathematical basis for data compression , 1971 .

[43]  W. Wonham Some applications of stochastic difierential equations to optimal nonlinear ltering , 1964 .

[44]  G. Kallianpur Stochastic Filtering Theory , 1980 .