论文信息 - The Use of Shrinkage Estimators in Linear Discriminant Analysis - 字舞流文

The Use of Shrinkage Estimators in Linear Discriminant Analysis

Probably the most common single discriminant algorithm in use today is the linear algorithm. Unfortunately, this algorithm has been shown to frequently behave poorly in high dimensions relative to other algorithms, even on suitable Gaussian data. This is because the algorithm uses sample estimates of the means and covariance matrix which are of poor quality in high dimensions. It seems reasonable that if these unbiased estimates were replaced by estimates which are more stable in high dimensions, then the resultant modified linear algorithm should be an improvement. This paper studies using a shrinkage estimate for the covariance matrix in the linear algorithm. We chose the linear algorithm, not because we particularly advocate its use, but because its simple structure allows one to more easily ascertain the effects of the use of shrinkage estimates. A simulation study assuming two underlying Gaussian populations with common covariance matrix found the shrinkage algorithm to significantly outperform the standard linear algorithm in most cases. Several different means, covariance matrices, and shrinkage rules were studied. A nonparametric algorithm, which previously had been shown to usually outperform the linear algorithm in high dimensions, was included in the simulation study for comparison.

John Van Ness | Roger Peck | J. V. Ness | Roger Peck

[1] L. R. Haff. ESTIMATION OF THE INVERSE COVARIANCE MATRIX: RANDOM MIXTURES OF THE INVERSE WISHART MATRIX AND THE IDENTITY , 1979 .

[2] Ching Y. Suen,et al. n-Gram Statistics for Natural Language Understanding and Text Processing , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3] Richard A. Reyment,et al. Discriminant analysis of a Cretaceous foraminifer using shrunken estimators , 1978 .

[4] Richard O. Duda,et al. Experiments in the recognition of hand-printed text, part II: context analysis , 1968, AFIPS '68 (Fall, part II).

[5] James L. Peterson,et al. Computer programs for detecting and correcting spelling errors , 1980, CACM.

[6] Andrew S. Tanenbaum,et al. Structured Computer Organization , 1976 .

[7] Godfried T. Toussaint,et al. A bottom-up and top-down approach to using context in text recognition , 1979 .

[8] A. Wald. On a Statistical Problem Arising in the Classification of an Individual into One of Two Groups , 1944 .

[9] Sargur N. Srihari,et al. Integrating diverse knowledge sources in text recognition , 1982, TOIS.

[10] L. R. Haff. Empirical Bayes Estimation of the Multivariate Normal Covariance Matrix , 1980 .

[11] B. Efron,et al. Multivariate Empirical Bayes and Estimation of Covariance Matrices , 1976 .

[12] J. M. Brady,et al. Using knowledge in the computer interpretation of handwritten FORTRAN coding sheets , 1976 .

[13] Pasquale J. Di Pillo. Further applications of bias to discriminant analysis , 1976 .

[14] EDWARD M. RISEMAN,et al. Contextual Word Recognition Using Binary Digrams , 1971, IEEE Transactions on Computers.

[15] Godfried T. Toussaint,et al. Experiments in Text Recognition with the Modified Viterbi Algorithm , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16] Allen R. Hanson,et al. A Contextual Postprocessing System for Error Correction Using Binary n-Grams , 1974, IEEE Transactions on Computers.

[17] Pasquale J. Dipillo. Biased discriminant analysis: Evaluation of the optimum probability of misclassification , 1979 .

[18] S. Gupta. OPTIMUM CLASSIFICATION RULES FOR CLASSIFICATION INTO TWO MULTIVARIATE NORMAL POPULATIONS , 1965 .

[19] Patrick A. V. Hall,et al. Approximate String Matching , 1994, Encyclopedia of Algorithms.

[20] S. Gupta. THEORIES AND METHODS IN CLASSIFICATION: A REVIEW , 1973 .

[21] R. Fisher. THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[22] N. Campbell. Shrunken Estimators in Discriminant and Canonical Variate Analysis , 1980 .

[23] Pasquale J. Di Pillo. Further applications of bias to discriminant analysis , 1976 .

[24] M. M. Barnard. THE SECULAR VARIATIONS OF SKULL CHARACTERS IN FOUR SERIES OF EGYPTIAN SKULLS , 1935 .

[25] J. V. Ness,et al. On the Effects of Dimension in Discriminant Analysis , 1976 .

[26] Calyampudi R. Rao. A General Theory of Discrimination When the Information About Alternative Population Distributions is Based on Samples , 1954 .

[27] Edward George Fisher. The use of context in character recognition. , 1976 .

[28] Clifford S. Stein. Estimation of a covariance matrix , 1975 .

[29] J. V. Ness. On the Effects of Dimension in Discriminant Analysis for Unequal Covariance Populations , 1979 .

[30] E. Parzen. On Estimation of a Probability Density Function and Mode , 1962 .

[31] David L. Neuhoff,et al. The Viterbi algorithm as an aid in text recognition (Corresp.) , 1975, IEEE Trans. Inf. Theory.

[32] Di Pillo,et al. Biased discriminant analysis : evaluation of the optimum probability of misclassification , 1979 .

[33] Ned Glick,et al. Additive estimators for probabilities of correct classification , 1978, Pattern Recognit..

[34] John Van Ness,et al. On the dominance of non-parametric Bayes rule discriminant algorithms in high dimensions , 1980, Pattern Recognit..

[35] J. Remme,et al. A simulative comparison of linear, quadratic and kernel discrimination , 1980 .

[36] S. John. Errors in Discrimination , 1961 .

[37] M. Rosenblatt. Remarks on Some Nonparametric Estimates of a Density Function , 1956 .