Estimation of Missing Values for the Analysis of Incomplete Data

When only a few observations are missing from data that otherwise conform to a planned experimental design, the fitting of a linear model to the data by the principle of least squares is most simply carried out using estimated values for the missing observations, as in this way most effective use is made of the remaining symmetry in the experimental results. A comprehensive account of this procedure was given by Yates [12] in 1933. Firstly, the data are completed by inserting estimates of the missing values, which are determined so that the residual sum of squares for the completed data shall be a minimum. The correct estimates of treatment effects and other parameters in the linear model are then given by the standard formulae for the experimental design, and the correct residual sum of squares for the incomplete data is given by the standard analysis of variance (with the degrees of freedom appropriately reduced). However, the other components of this analysis are only approximations (though usually reasonably good) to the corresponding components of a correct least squares analysis of variance, since the latter involves in principle the fitting of one or more auxiliary models, for which appropriate estimates of the missing values would be necessary. It should also be noted that the usual formulae for standard errors of treatment comparisons need to be modified to take into account the loss in precision due to the missing values. The non-standard aspects of this missing value procedure will be discussed by the present writer in two papers: the present paper, dealing with the estimation of missing values (which enables completion of the first phase of analysis), and a second paper [11] which deals with correction of the analysis of variance and the derivation of standard errors. In connection with incomplete block designs, lattice squares,