Classification with finite memory

Consider the following situation. A device called a classifier observes a probability law P on l-vectors from an alphabet of size A. Its task is to observe a second probability law Q and decide whether P/spl equiv/Q or P and Q are sufficiently different according to some appropriate criterion. If the classifier has available an unlimited memory (so that it can remember P(z) exactly for all z), this is a simple matter. In fact for most differentness criteria, a finite memory of 2/sup (log/ /sup A)l+o(l)/ bits will suffice (for large l), i.e., store a finite approximation of P(z) for all A/sup l/z's. In a sense made precise in this paper, it is shown that a memory of only about 2/sup Rl/ bits is required, where the quantity R<log A, and is closely related to the entropy of P. Further, it is shown that if instead of being given P(z), for all z, the classifier is given a training sequence drawn with a probability law P that can be stored using about 2/sup Rl/ bits, then correct classification is also possible.

[1]  I. Csiszár Information Theory , 1981 .

[2]  Robert B. Ash,et al.  Information Theory , 2020, The SAGE International Encyclopedia of Mass Media and Society.

[3]  Aaron D. Wyner,et al.  Some asymptotic properties of the entropy of a stationary ergodic data source with applications to data compression , 1989, IEEE Trans. Inf. Theory.

[4]  N. J. A. Sloane,et al.  Lower bounds for constant weight codes , 1980, IEEE Trans. Inf. Theory.