Multiple imputation and functional methods in the presence of measurement error and missingness in explanatory variables

In many applications involving regression analysis, explanatory variables (or covariates) may be imprecisely measured or may contain missing values. Although there exists a vast literature on measurement error modeling to account for errors-in-variables, and on missing data methodology to handle missingness, very few methods have been developed to simultaneously address both. In this paper, we consider likelihood-based multiple imputation to handle missing data, and combine this with two well-known functional measurement error methods: simulation-extrapolation and corrected score. This unified approach has several appealing characteristics: the model fitting procedure is easy to understand and off-the-shelf software can be incorporated into the modeling framework; no calibration data or a validation subset is required in the model fitting procedure; and the missing data component of the proposed approach is likelihood-based which allows standard likelihood machinery. We demonstrate our methods on simulated datasets and apply them to daily ozone pollution measurements in Los Angeles where observed covariates consist of missing data and imprecise measurements. We conclude that the proposed methods substantially reduce bias and mean squared errors in regression coefficients, in comparison to methods that ignore either measurement error or missingness in covariates.

[1]  The Bias and Efficiency of Incomplete-Data Estimators in Small Univariate Normal Samples , 2012, 1204.3132.

[2]  I. White,et al.  Review of inverse probability weighting for dealing with missing data , 2013, Statistical methods in medical research.

[3]  Yijian Huang,et al.  Consistent Functional Methods for Logistic Regression With Errors in Covariates , 2001 .

[4]  Min Wang,et al.  Bayesian structured variable selection in linear regression models , 2014, Computational Statistics.

[5]  Hua Liang,et al.  Partially Linear Models with Missing Response Variables and Error-prone Covariates. , 2007, Biometrika.

[6]  L. Stefanski Unbiased estimation of a nonlinear function a normal mean with application to measurement err oorf models , 1989 .

[7]  Srikesh G. Arunajadai,et al.  Handling covariates subject to limits of detection in regression , 2012, Environmental and Ecological Statistics.

[8]  J. R. Cook,et al.  Simulation-Extrapolation Estimation in Parametric Measurement Error Models , 1994 .

[9]  GMM nonparametric correction methods for logistic regression with error‐contaminated covariates and partially observed instrumental variables , 2018, Scandinavian journal of statistics, theory and applications.

[10]  Alan J. Miller Subset Selection in Regression , 1992 .

[11]  Chung-Wei Shen,et al.  Model selection for marginal regression analysis of longitudinal data with missing observations and covariate measurement error. , 2015, Biostatistics.

[12]  Jakub Stoklosa,et al.  A climate of uncertainty: accounting for error in climate variables for species distribution models , 2015 .

[13]  Alexander Kukush,et al.  Measurement Error Models , 2011, International Encyclopedia of Statistical Science.

[14]  C. Nicoletti,et al.  Estimating Income Poverty in the Presence of Missing Data and Measurement Error , 2009 .

[15]  Jeffrey S. Buzas,et al.  A note on corrected-score estimation , 1996 .

[16]  Daniel W. Schafer,et al.  Covariate measurement error in generalized linear models , 1987 .

[17]  Raymond J. Carroll,et al.  Covariate Measurement Error in Logistic Regression , 1985 .

[18]  James M. Robins,et al.  Large-sample theory for parametric multiple imputation procedures , 1998 .

[19]  T. Louis Finding the Observed Information Matrix When Using the EM Algorithm , 1982 .

[20]  D. Rubin Multiple imputation for nonresponse in surveys , 1989 .

[21]  Stef van Buuren,et al.  MICE: Multivariate Imputation by Chained Equations in R , 2011 .

[22]  J. Friedman,et al.  Estimating Optimal Transformations for Multiple Regression and Correlation. , 1985 .

[23]  Expected Estimating Equations for Missing Data, Measurement Error, and Misclassification, with Application to Longitudinal Nonignorable Missing Data , 2008, Biometrics.

[24]  Shen-Ming Lee,et al.  Closed-population capture--recapture models with measurement error and missing observations in covariates , 2019, Statistica Sinica.

[25]  S. Nielsen The stochastic EM algorithm: estimation and asymptotic results , 2000 .

[26]  Yijian Huang,et al.  Cox Regression with Accurate Covariates Unascertainable: A Nonparametric-Correction Approach , 2000 .

[27]  Raymond J. Carroll,et al.  Measurement error in nonlinear models: a modern perspective , 2006 .

[28]  J. Schafer Multiple imputation: a primer , 1999, Statistical methods in medical research.

[29]  Robert Tibshirani,et al.  Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical Accuracy , 1986 .

[30]  Grace Y. Yi,et al.  A functional generalized method of moments approach for longitudinal studies with missing responses and covariate measurement error , 2012, Biometrika.

[31]  Stef van Buuren,et al.  Flexible Imputation of Missing Data , 2012 .

[32]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[33]  G. Casella,et al.  Objective Bayesian Variable Selection , 2006 .

[34]  L. Freedman,et al.  Design aspects of calibration studies in nutrition, with analysis of missing data in linear measurement error models. , 1997, Biometrics.

[35]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[36]  Tsuyoshi Nakamura Corrected score function for errors-in-variables models : Methodology and application to generalized linear models , 1990 .

[37]  J. R. Cook,et al.  Simulation-Extrapolation: The Measurement Error Jackknife , 1995 .

[38]  Edward H. Ip,et al.  Stochastic EM: method and application , 1996 .

[39]  Ben Armstrong,et al.  Measurement error in the generalised linear model , 1985 .

[40]  G. Celeux,et al.  Asymptotic properties of a stochastic EM algorithm for estimating mixing proportions , 1993 .

[41]  Leonard A. Stefanski,et al.  Corrected Score Estimation via Complex Variable Simulation Extrapolation , 2002 .

[42]  Manuel J. A. Eugster,et al.  Weighted and robust archetypal analysis , 2011, Comput. Stat. Data Anal..

[43]  Yijian Huang,et al.  A Simple Corrected Score for Logistic Regression with Errors-in-Covariates , 2015 .