Imputation of missing categorical data by maximizing internal consistency

This paper suggests a method to supplant missing categorical data by “reasonable” replacements. These replacements will maximize the consistency of the completed data as measured by Guttman's squared correlation ratio. The text outlines a solution of the optimization problem, describes relationships with the relevant psychometric theory, and studies some properties of the method in detail. The main result is that the average correlation should be at least 0.50 before the method becomes practical. At that point, the technique gives reasonable results up to 10–15% missing data.

[1]  H. Spath Cluster Dissection and Analysis , 1985 .

[2]  Richard Staelin,et al.  A proposal for handling missing data , 1975 .

[3]  System development corporation , 1968, ACM National Conference.

[4]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[5]  D Scheibler,et al.  Monte Carlo Tests of the Accuracy of Cluster Analysis Algorithms: A Comparison of Hierarchical and Nonhierarchical Methods. , 1985, Multivariate behavioral research.

[6]  G. M. Southward,et al.  Analysis of Categorical Data: Dual Scaling and Its Applications , 1981 .

[7]  L. Guttman,et al.  The Quantification of a class of attributes : A theory and method of scale construction , 1941 .

[8]  R. Clarke,et al.  Theory and Applications of Correspondence Analysis , 1985 .

[9]  G. W. Milligan,et al.  An examination of the effect of six types of error perturbation on fifteen clustering algorithms , 1980 .

[10]  Donald B. Rubin,et al.  EM and beyond , 1991 .

[11]  I. Jolliffe,et al.  Nonlinear Multivariate Analysis , 1992 .

[12]  R. R. Hocking,et al.  The analysis of incomplete data. , 1971 .

[13]  Paul Horst,et al.  The prediction of personal adjustment. , 1942 .

[14]  Ingram Olkin,et al.  Incomplete data in sample surveys , 1985 .

[15]  W. Heiser,et al.  Clusteringn objects intok groups under optimal scaling of variables , 1989 .

[16]  Brian Everitt,et al.  Homogeneity analysis of incomplete data , 1986 .

[17]  Walter D. Fisher On Grouping for Maximum Homogeneity , 1958 .

[18]  西里 静彦,et al.  Analysis of categorical data : dual scaling and its applications , 1980 .

[19]  M. Hill,et al.  Nonlinear Multivariate Analysis. , 1990 .

[20]  Roderick J. A. Little,et al.  The Analysis of Social Science Data with Missing Values , 1989 .