Secondary Data Analysis when there are Missing Observations

Abstract A data set having missing observations often is completed by using imputed values. Our objective is to improve the practice of secondary data analysis by looking at the interplay of different imputation techniques and different methods that secondary data analysts use when there are both observed and imputed values. Secondary data analysts typically either treat the completed data set as if it has only observed values or ignore the imputations and analyze only the observed values. The first objective of our research is to investigate the effect on the properties of standard statistical techniques of proceeding in these ways. We assume that the missing data cannot be regarded as missing at random (MAR), and that the secondary data analyst's objectives are confidence intervals for the regression coefficients in a simple linear regression. Standard, “general purpose” imputation methods are emphasized. The second objective is to investigate the performance of confidence intervals based on multiple im...