Classification with empirically observed statistics is studied for finite alphabet sources. Efficient universal discriminant functions are described and shown to be related to universal data compression. It is demonstrated that if one of the probability measure of the two classes is not known, it is still possible to define a universal discrimination function which performs as the optimal (likelihood ratio) discriminant function (which can be evaluated only if the probability measures of the two classes are available). If both of the probability measures are not available but training vectors from at least one of the two classes are available, it is demonstrated that no discriminant function can perform efficiency of the length of the training vectors does not grow at least linearly with the length of the classified vector. A universal discriminant function is introduced and shown to perform efficiently when the length of the training vectors grows linearly with the length of the classified sequence, in the sense that it yields an error exponent that is arbitrarily close to that of the optimal discriminant function. >
[1]
S. Gupta.
THEORIES AND METHODS IN CLASSIFICATION: A REVIEW
,
1973
.
[2]
Jacob Ziv,et al.
Universal decoding for finite-state channels
,
1985,
IEEE Trans. Inf. Theory.
[3]
H. Chernoff.
A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations
,
1952
.
[4]
Jorma Rissanen,et al.
Universal coding, information, prediction, and estimation
,
1984,
IEEE Trans. Inf. Theory.
[5]
Lee D. Davisson,et al.
Universal noiseless coding
,
1973,
IEEE Trans. Inf. Theory.
[6]
Abraham Lempel,et al.
Compression of individual sequences via variable-rate coding
,
1978,
IEEE Trans. Inf. Theory.