In this work we explore a number of questions arising in the Gaussian two-category classification problem when the common covariance matrix is unknown and must be estimated in order to approximate the hyperplane for decisionmaking, which is optimum for the true covariance matrix. Computed curves and, in some cases, closed form expressions are presented for describing the performance of such adaptive systems as a function of data dimensionality and learning sample size. In particular, we show that a considerable improvement in performance can be realized through the use of some {\em a priori} knowledge of covariance matrix structure. We use the adaptive filter output signal-to-interference noise power ratio (SIR) as a measure of detection performance to compare estimators designed for a weakly stationary stochastic process (Toeplitz form covariance matrix) with an estimator designed for a general sample covariance matrix. The ratio of expected SIR performance for a generalized covariance estimate to that for a filter which is optimum for the true covariance matrix is shown to be \cong [1 - N/N_{s} - 7/N_{s}] , where N is the filter (data) dimension, and N_{s} is the sample size. For a constrained Toeplitz form covariance matrix estimate, it is argued that the expected SIR approaches the optimum SIR as \cong [1 - \pi' (N;B)/N_{s} - 7 \pi'(N;B)/NN_{s}] , with \pi'(N;B) \cong A + (B + 1) \ln N + 1/2N , where A = 0.577 is Euler's constant, and 0 \leq B is the input data correlation time normalized by the sampling period. Therefore, the constrained Toeplitz covariance matrix estimate appears to operate with an "effective sample size" equal to N'_{s} \cong [N/ \pi' (N;B)]N_{s} and offers the potential of high expected SIR at a sample size N_{s} for which the generalized estimator may provide poor results. The effective sample size is shown to be {\em even greater} if the dimensionality of the constrained Toeplitz covariance matrix estimate is tailored to the input data correlation time in the sense that only a band of lags about the main diagonal is estimated. In this case, N'_{s} \cong (kN^{2})N_{s} , where k is a constant, independent of N. An estimate of the computational error associated with the results, and due primarily to a quadratic approximation employed for the SIR, is derived and compared with the normal statistical error. Because closed form solutions for adaptive filter performance generally involve tedious calculations and involved expressions, certain other workers have relied on a similar approximation, but without the benefit of knowing the quality of the approximation for a given N and N_{s} . Insight is also provided into other matters such as the effect of the specific form of the category mean value vector (signal) on adaptation performance. It is felt that many application areas will benefit from the results presented here, e.g., biomedical image recognition, earth resource satellite multispectral data classification, adaptive linear prediction, adaptive antenna array processing, etc. The manner in which the results impact the array processing area is discussed in some detail.
[1]
U. Grenander,et al.
Statistical analysis of stationary time series
,
1958
.
[2]
David B. Cooper,et al.
On the Asymptotic Improvement in the Out- come of Supervised Learning Provided by Additional Nonsupervised Learning
,
1970,
IEEE Transactions on Computers.
[3]
B. Widrow,et al.
Adaptive antenna systems
,
1967
.
[4]
L. J. Griffiths,et al.
A simple adaptive algorithm for real-time processing in antenna arrays
,
1969
.
[5]
Robert M. Gray,et al.
On the asymptotic eigenvalue distribution of Toeplitz matrices
,
1972,
IEEE Trans. Inf. Theory.
[6]
O. L. Frost,et al.
An algorithm for linearly constrained adaptive array processing
,
1972
.
[7]
B. Chandrasekaran,et al.
On dimensionality and sample size in statistical pattern classification
,
1971,
Pattern Recognit..
[8]
W. H. Highleyman,et al.
The design and analysis of pattern recognition experiments
,
1962
.
[9]
N. R. Goodman.
Statistical analysis based on a certain multivariate complex Gaussian distribution
,
1963
.
[10]
Donald H. Foley.
Considerations of sample and feature size
,
1972,
IEEE Trans. Inf. Theory.
[11]
L. H. Koopmans.
The spectral analysis of time series
,
1974
.
[12]
D. Cooper,et al.
When should a learning machine ask for help?
,
1974,
IEEE Trans. Inf. Theory.
[13]
G. F. Hughes,et al.
On the mean accuracy of statistical pattern recognizers
,
1968,
IEEE Trans. Inf. Theory.
[14]
S. D. Morgera.
Selective spatial and spectral adaptive processing of the acoustic field
,
1975
.