SPSS Syntax for Missing Value Imputation in Test and Questionnaire Data

A well-known problem in the analysis of test and questionnaire data is that some item scores may be missing. Advanced methods for the imputation of missing data are available, such as multiple imputation under the multivariate normal model and imputation under the saturated logistic model (Schafer, 1997). Accompanying software was made available by, for example, Schafer (1998a, 1998b) and in SOLAS (2001) and S-Plus 6 for Windows (2001). However, these methods and software may be too complicated for a typical psychological researcher, and for the imputation of his or her missing data, he or she depends on the help of a trained statistician. If available, this statistician may not always have enough time or may not be an experienced software user, so the researcher may decide to simply delete all incomplete observations. To help researchers impute scores using simple methods, two SPSS subroutines were written. The aim of these subroutines is that researchers can apply them easily within SPSS and without experienced help. The subroutine “tw” performs two-way imputation, and the subroutine “rf” performs responsefunction imputation. Two-way imputation and response-function imputation are described by Sijtsma and Van der Ark (2003). Simulation studies by Van der Ark and Sijtsma (in press) indicate that these imputation methods work rather well when applied to an approximately unidimensional set of items (i.e., the items measure the same construct). The subroutines allow the researcher to transform an SPSS data file with missing values (an incomplete data file) into an SPSS data file without missing values (a completed data file). The researcher can use the completed data file for further analysis. To run the subroutines, one must select the variables containing the missing scores that need to be imputed, and some optional arguments also can be specified. For two-way imputation, the most important optional argument pertains to changing or removing the random error that is added to the imputed values by default. For response-function imputation, the most important optional argument pertains to changing the minimum group size used for estimating the response function (see Sijtsma & Van der Ark, 2003).