A measure of relative entropy between individual sequences with application to universal classification

A new notion of empirical informational divergence (relative entropy) between two individual sequences is introduced. If the two sequences are independent realizations of two finite-order, finite alphabet, stationary Markov processes, the empirical relative entropy converges to the relative entropy almost surely. This empirical divergence is based on a version of the Lempel-Ziv data compression algorithm. A simple universal algorithm for classifying individual sequences into a finite number of classes, which is based on the empirical divergence, is introduced. The algorithm discriminates between the classes whenever they are distinguishable by some finite-memory classifier for almost every given training set and almost any test sequence from these classes. It is universal in the sense that it is independent of the unknown sources. >

[1]  Michael Gutman,et al.  Asymptotically optimal classification for multiple tests with empirically observed statistics , 1989, IEEE Trans. Inf. Theory.

[2]  Marcelo J. Weinberger,et al.  Upper bounds on the probability of sequences emitted by finite-state sources and on the redundancy of the Lempel-Ziv algorithm , 1992, IEEE Trans. Inf. Theory.

[3]  Ofer Zeitouni,et al.  On universal hypotheses testing via large deviations , 1991, IEEE Trans. Inf. Theory.

[4]  Giuseppe Longo,et al.  The error exponent for the noiseless encoding of finite ergodic Markov sources , 1981, IEEE Trans. Inf. Theory.

[5]  Neri Merhav,et al.  On the estimation of the order of a Markov chain and universal data compression , 1989, IEEE Trans. Inf. Theory.

[6]  A. Gualtierotti H. L. Van Trees, Detection, Estimation, and Modulation Theory, , 1976 .

[7]  Jacob Ziv,et al.  On classification with empirically observed statistics and universal data compression , 1988, IEEE Trans. Inf. Theory.

[8]  Richard E. Blahut,et al.  Principles and practice of information theory , 1987 .

[9]  Neri Merhav,et al.  When is the generalized likelihood ratio test optimal? , 1992, IEEE Trans. Inf. Theory.

[10]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[11]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[12]  Aaron D. Wyner,et al.  Fixed data base version of the Lempel-Ziv data compression algorithm , 1991, IEEE Trans. Inf. Theory.

[13]  Michael Rodeh,et al.  Linear Algorithm for Data Compression via String Matching , 1981, JACM.

[14]  Toby Berger,et al.  Review of Information Theory: Coding Theorems for Discrete Memoryless Systems (Csiszár, I., and Körner, J.; 1981) , 1984, IEEE Trans. Inf. Theory.

[15]  Marcelo Weinberger,et al.  Upper Bounds On The Probability Of Sequences Emitted By Finite-state Sources And On The Redundancy Of The Lempel-Ziv Algorithm , 1991, Proceedings. 1991 IEEE International Symposium on Information Theory.