The Treatment of Missing Values in Discriminant Analysis—I. The Sampling Experiment

Abstract Probabilities of correct classification under several commonly used methods of handling missing values are studied by Monte Carlo methods. The methods include use of only complete observation vectors; use of all observations with no replacement; substitution of means for missing observations; Buck's regression method; and, Dear's principal component method. Discriminant functions were formed using independent random samples from two multivariate normal distributions with equal covariance matrices. Missing values occur randomly in each variable and independently of missing values in other variables. The mean substitution method and principal component method are, in general, superior to the other methods for cases considered.

[1]  S. S. Wilks Moments and Distributions of Estimates of Population Parameters from Fragmentary Samples , 1932 .

[2]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[3]  Frederic M. Lord,et al.  Estimation of Parameters from Incomplete Data , 1954 .

[4]  George L. Edgett Multiple Regression with Missing Observations Among the Independent Variables , 1956 .

[5]  T. W. Anderson Maximum Likelihood Estimates for a Multivariate Normal Distribution when Some Observations are Missing , 1957 .

[6]  George E. Nicholson,et al.  Estimation of Parameters from Incomplete Multivariate Samples , 1957 .

[7]  M. E. Muller,et al.  A Note on the Generation of Random Normal Deviates , 1958 .

[8]  T. W. Anderson,et al.  An Introduction to Multivariate Statistical Analysis , 1959 .

[9]  T. W. Anderson An Introduction to Multivariate Statistical Analysis , 1959 .

[10]  S. F. Buck A Method of Estimation of Missing Values in Multivariate Data Suitable for Use with an Electronic Computer , 1960 .

[11]  D. Stoller,et al.  On the Generation of Normal Random Vectors , 1962 .

[12]  M. Glasser,et al.  Linear Regression Analysis with Missing Observations among the Independent Variables , 1964 .

[13]  R. Bargmann,et al.  MAXIMUM LIKELIHOOD ESTIMATION WITH INCOMPLETE MULTIVARIATE DATA , 1964 .

[14]  R. Elashoff,et al.  Missing Observations in Multivariate Statistics I. Review of the Literature , 1966 .

[15]  O. J. Dunn,et al.  Probabilities of Correct Classification in Discriminant Analysis , 1966 .

[16]  Robert M. Elashoff,et al.  Missing Observations in Multivariate Statistics II. Point Estimation in Simple Linear Regression , 1967 .

[17]  Y. Haitovsky Missing Data in Regression Analysis , 1968 .

[18]  R. R. Hocking,et al.  Estimation of Parameters in the Multivariate Normal Distribution with Missing Observations , 1968 .

[19]  Esther C. Jackson Missing Values in Linear Multiple Discriminant Analysis , 1968 .

[20]  Abdelmonem A. Afifi,et al.  Missing Observations in Multivariate Statistics III: Large Sample Analysis of Simple Linear Regression , 1969 .

[21]  Edwin H. Chen,et al.  A Random Normal Number Generator for 32-Bit-Word Computers , 1971 .