相关论文

Imputation of missing data in time series for air pollutants

Abstract:Missing data are major concerns in epidemiological studies of the health effects of environmental air pollutants. This article presents an imputation-based method that is suitable for multivariate time series data, which uses the EM algorithm under the assumption of normal distribution. Different approaches are considered for filtering the temporal component. A simulation study was performed to assess validity and performance of proposed method in comparison with some frequently used methods. Simulations showed that when the amount of missing data was as low as 5%, the complete data analysis yielded satisfactory results regardless of the generating mechanism of the missing data, whereas the validity began to degenerate when the proportion of missing values exceeded 10%. The proposed imputation method exhibited good accuracy and precision in different settings with respect to the patterns of missing observations. Most of the imputations obtained valid results, even under missing not at random. The methods proposed in this study are implemented as a package called mtsdi for the statistical software system R .

参考文献

[1]  D. Cox,et al.  An Analysis of Transformations , 1964 .

[2]  A. Plaia,et al.  Single imputation method of missing values in environmental pollution data sets , 2006 .

[3]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .

[4]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[5]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[6]  H. D. de Vet,et al.  Missing Data: A Systematic Review of How They Are Reported and Handled , 2012, Epidemiology.

[7]  Paula Diehr,et al.  Imputation of missing longitudinal data: a comparison of methods. , 2003, Journal of clinical epidemiology.

[8]  M. Gorelick,et al.  Bias arising from missing data in predictive models. , 2006, Journal of clinical epidemiology.

[9]  T. Stijnen,et al.  Review: a gentle introduction to imputation of missing values. , 2006, Journal of clinical epidemiology.

[10]  R. Tibshirani,et al.  Generalized additive models for medical research , 1986, Statistical methods in medical research.

[11]  E. Beale,et al.  Missing Values in Multivariate Analysis , 1975 .

[12]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[13]  S. F. Buck A Method of Estimation of Missing Values in Multivariate Data Suitable for Use with an Electronic Computer , 1960 .

[14]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[15]  R. R. Hocking,et al.  The analysis of incomplete data. , 1971 .

[16]  Harri Niska,et al.  Methods for imputation of missing values in air quality data sets , 2004 .

[17]  Gwilym M. Jenkins,et al.  Time series analysis, forecasting and control , 1971 .

[18]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[19]  B. Silverman,et al.  Nonparametric Regression and Generalized Linear Models: A roughness penalty approach , 1993 .

[20]  O. Miettinen,et al.  Theoretical Epidemiology: Principles of Occurrence Research in Medicine. , 1987 .

[21]  Roderick J. A. Little Regression with Missing X's: A Review , 1992 .

[22]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[23]  S Greenland,et al.  A critical look at methods for handling missing covariates in epidemiologic regression analyses. , 1995, American journal of epidemiology.

[24]  Gary W. Fuller,et al.  An empirical approach for the prediction of daily mean PM10 concentrations , 2002 .

[25]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data , 1988 .

[26]  F. Dominici,et al.  On the use of generalized additive models in time-series studies of air pollution and health. , 2002, American journal of epidemiology.

[27]  J. Schwartz,et al.  Methodological issues in studies of air pollution and daily counts of deaths or hospital admissions. , 1996, Journal of epidemiology and community health.

[28]  C. Willmott Some Comments on the Evaluation of Model Performance , 1982 .

引用
Denoising Recurrent Neural Networks for Classifying Crash-Related Events
IEEE Transactions on Intelligent Transportation Systems
2020
Resilient Edge Data Management Framework
IEEE Transactions on Services Computing
2020
Application of a multi-stage neural network approach for time-series landfill gas modeling with missing data imputation.
Waste management
2020
Time Series Imputation Using Genetic Programming and Lagrange Interpolation
2016 5th Brazilian Conference on Intelligent Systems (BRACIS)
2016
Bidirectional Mean Distance Estimation: A New Gap filling Method
BRACIS
2019
Predicting Loss of Communication Between Radio Enabled Devices Using Deep Recurrent Neural Networks
2019
Non Linear Time Series Analysis of Air Pollutants with Missing Data
Advances in Neural Networks
2016
Metaheuristic approaches in biopharmaceutical process development data analysis
Bioprocess and Biosystems Engineering
2019
Short Term Electrical Energy Consumption Forecasting using RNN-LSTM
2019 International Conference on Electrical Engineering and Computer Science (ICECOS)
2019
Machine learning in telemetry data mining of space mission: basics, challenging and future directions
Artificial Intelligence Review
2019
Imputation methods for filling missing data in urban air pollution data for Malaysia
2018
Adaptive Recovery of Incomplete Datasets for Edge Analytics
2018 IEEE 2nd International Conference on Fog and Edge Computing (ICFEC)
2018
A wavelet lifting approach to long-memory estimation
Statistics and Computing
2016
Measurement protocols, random-variable-valued measurements, and response process error: Estimation and inference when sample data are not deterministic
PloS one
2020
Deep Air Quality Forecasts: Suspended Particulate Matter Modeling With Convolutional Neural and Long Short-Term Memory Networks
IEEE Access
2020
Imputation methods for addressing missing data in short-term monitoring of air pollutants.
The Science of the total environment
2020
A hybrid air quality early-warning framework: An hourly forecasting model with online sequential extreme learning machines and empirical mode decomposition algorithms.
The Science of the total environment
2019
Missing data imputation using fuzzy-rough methods
Neurocomputing
2016
Input-Adaptive Proxy for Black Carbon as a Virtual Sensor
Sensors
2019
On the Imputation of Missing Values in Univariate PM_10 P M 10 Time Series
EUROCAST
2017