Imputation of missing data in time series for air pollutants
Abstract:Missing data are major concerns in epidemiological studies of the health effects of environmental air pollutants. This article presents an imputation-based method that is suitable for multivariate time series data, which uses the EM algorithm under the assumption of normal distribution. Different approaches are considered for filtering the temporal component. A simulation study was performed to assess validity and performance of proposed method in comparison with some frequently used methods. Simulations showed that when the amount of missing data was as low as 5%, the complete data analysis yielded satisfactory results regardless of the generating mechanism of the missing data, whereas the validity began to degenerate when the proportion of missing values exceeded 10%. The proposed imputation method exhibited good accuracy and precision in different settings with respect to the patterns of missing observations. Most of the imputations obtained valid results, even under missing not at random. The methods proposed in this study are implemented as a package called mtsdi for the statistical software system R .
暂无分享,去 创建一个
[1] D. Cox,et al. An Analysis of Transformations , 1964 .
[2] A. Plaia,et al. Single imputation method of missing values in environmental pollution data sets , 2006 .
[3] Joseph L Schafer,et al. Analysis of Incomplete Multivariate Data , 1997 .
[4] G. McLachlan,et al. The EM algorithm and extensions , 1996 .
[5] P. McCullagh,et al. Generalized Linear Models , 1984 .
[6] H. D. de Vet,et al. Missing Data: A Systematic Review of How They Are Reported and Handled , 2012, Epidemiology.
[7] Paula Diehr,et al. Imputation of missing longitudinal data: a comparison of methods. , 2003, Journal of clinical epidemiology.
[8] M. Gorelick,et al. Bias arising from missing data in predictive models. , 2006, Journal of clinical epidemiology.
[9] T. Stijnen,et al. Review: a gentle introduction to imputation of missing values. , 2006, Journal of clinical epidemiology.
[10] R. Tibshirani,et al. Generalized additive models for medical research , 1986, Statistical methods in medical research.
[11] E. Beale,et al. Missing Values in Multivariate Analysis , 1975 .
[12] D. Rubin,et al. Statistical Analysis with Missing Data. , 1989 .
[13] S. F. Buck. A Method of Estimation of Missing Values in Multivariate Data Suitable for Use with an Electronic Computer , 1960 .
[14] D. Rubin. INFERENCE AND MISSING DATA , 1975 .
[15] R. R. Hocking,et al. The analysis of incomplete data. , 1971 .
[16] Harri Niska,et al. Methods for imputation of missing values in air quality data sets , 2004 .
[17] Gwilym M. Jenkins,et al. Time series analysis, forecasting and control , 1971 .
[18] R Core Team,et al. R: A language and environment for statistical computing. , 2014 .
[19] B. Silverman,et al. Nonparametric Regression and Generalized Linear Models: A roughness penalty approach , 1993 .
[20] O. Miettinen,et al. Theoretical Epidemiology: Principles of Occurrence Research in Medicine. , 1987 .
[21] Roderick J. A. Little. Regression with Missing X's: A Review , 1992 .
[22] David B. Dunson,et al. Bayesian Data Analysis , 2010 .
[23] S Greenland,et al. A critical look at methods for handling missing covariates in epidemiologic regression analyses. , 1995, American journal of epidemiology.
[24] Gary W. Fuller,et al. An empirical approach for the prediction of daily mean PM10 concentrations , 2002 .
[25] Roderick J. A. Little,et al. Statistical Analysis with Missing Data , 1988 .
[26] F. Dominici,et al. On the use of generalized additive models in time-series studies of air pollution and health. , 2002, American journal of epidemiology.
[27] J. Schwartz,et al. Methodological issues in studies of air pollution and daily counts of deaths or hospital admissions. , 1996, Journal of epidemiology and community health.
[28] C. Willmott. Some Comments on the Evaluation of Model Performance , 1982 .