A missing values imputation method for time series data: an efficient method to investigate the health effects of sulphur dioxide levels

Environmental data contains lengthy records of sequential missing values. Practical problem arose in the analysis of adverse health effects of sulphur dioxide (SO2) levels and asthma hospital admissions for Sydney, Nova Scotia, Canada. Reliable missing values imputation techniques are required to obtain valid estimates of the associations with sparse health outcomes such as asthma hospital admissions. In this paper, a new method that incorporates prediction errors to impute missing values is described using mean daily average sulphur dioxide levels following a stationary time series with a random error. Existing imputation methods failed to incorporate the prediction errors. An optimal method is developed by extending a between forecast method to include prediction errors. Validity and efficacy are demonstrated comparing the performances with the values that do not include prediction errors. The performances of the optimal method are demonstrated by increased validity and accuracy of the β coefficient of the Poisson regression model for the association with asthma hospital admissions. Visual inspection of the imputed values of sulphur dioxide levels with prediction errors demonstrated that the variation is better captured. The method is computationally simple and can be incorporated into the existing statistical software. Copyright © 2009 John Wiley & Sons, Ltd.