论文信息 - Classification With Finite Memory Revisited

Classification With Finite Memory Revisited

We consider the class of strong-mixing probability laws with positive transitions that are defined on doubly infinite sequences in a finite alphabet A. A device called the classifier (or discriminator) observes a training sequence whose probability law Q is unknown. The classifier's task is to consider a second probability law P and decide whether P = Q, or P and Q are sufficiently different according to some appropriate criterion Delta(Q,P) > Delta. If the classifier has available an infinite amount of training data, this is a simple matter. However, here we study the case where the amount of training data is limited to N letters. We define a function NDelta(Q|P), which quantifies the minimum length sequence needed to distinguish Q and P and the class M(NDelta) of all probability laws pairs (Q,P) that satisfy NDelta(Q|P) les NDelta for some given positive number NDelta. It is shown that every pair Q,P of probability laws that are sufficiently different according to the Delta criterion is contained in M(NDelta). We demonstrate that for any universal classifier there exists some Q for which the classification probability lambda(Q) = 1 for some N-sequence emerging from Q, for some P : (Q,P) epsi M circ(NDelta).Delta(Q,P) > Delta, if N < NDelta. Conversely, we introduce a classification algorithm that is essentially optimal in the sense that for every (Q,P) epsi M(NDelta), the probability of classification error lambda(Q) is uniformly vanishing with N for every P : (Q,P) epsi M circ(NDelta) if N ges NDelta 1+O(log log N Delta /log N Delta ). The proposed algorithm finds the largest empirical conditional divergence for a set of contexts which appear in the tested N-sequence. The computational complexity of the classification algorithm is O(N2(log N)3). Also, we introduce a second simplified context classification algorithm with a computational complexity of only O(N(log N)4) that is efficient in the sense that for every pair (Q,P) epsi M(NDelta), the pairwise probability of classification error lambda(Q,P) for the pair Q,P vanishes with N if N ges NDelta 1+O(log log N Delta /log N Delta ). Conversely, lambda(Q,P) = 1 at least for some (Q,P) epsi M(NDelta), if N < NDelta.

Jacob Ziv | J. Ziv

[1] Jacob Ziv,et al. On fixed-database universal data compression with limited memory , 1997, IEEE Trans. Inf. Theory.

[2] A. D. Wyner,et al. The sliding-window Lempel-Ziv algorithm is asymptotically optimal , 1994, Proc. IEEE.

[3] Jacob Ziv,et al. Correction to: "An Efficient Universal Prediction Algorithm for Unknown Sources With Limited Training Data" , 2004, IEEE Transactions on Information Theory.

[4] Gadiel Seroussi,et al. Linear time universal coding and time reversal of tree sources via FSM closure , 2004, IEEE Transactions on Information Theory.

[5] Aaron D. Wyner,et al. Classification with finite memory , 1996, IEEE Trans. Inf. Theory.

[6] Jacob Ziv. A universal prediction lemma and applications to universal data compression and prediction , 2001, IEEE Trans. Inf. Theory.

[7] Jacob Ziv,et al. On Finite Memory Universal Data Compression and Classification of Individual Sequences , 2006, IEEE Transactions on Information Theory.

[8] Jacob Ziv,et al. An efficient universal prediction algorithm for unknown sources with limited training data , 2002, IEEE Trans. Inf. Theory.