Multiple Overimputation: A Unified Approach to Measurement Error and Missing Data

Although social scientists devote considerable effort to mitigating measurement error during data collection, they usually ignore the issue during data analysis. And although many statistical methods have been proposed for reducing measurement error-induced biases, few have been widely used because of implausible assumptions, high levels of model dependence, difficult computation, or inapplicability with multiple mismeasured variables. We develop an easy-to-use alternative without these problems; it generalizes the popular multiple imputation (mi) framework by treating missing data problems as a special case of extreme measurement error and corrects for both. Like mi, the proposed “multiple overimputation” (mo) framework is a simple two-step procedure. First, multiple (≈ 5) completed copies of the data set are created where cells measured without error are held constant, those missing are imputed from the distribution of predicted values, and cells (or entire variables) with measurement error are “overimputed,” that is imputed from the predictive distribution with observation-level priors defined by the mismeasured values and available external information, if any. In the second step, analysts can then run whatever statistical method they would have run on each of the overimputed data sets as if there had been no missingness or measurement error; the results are then combined via a simple averaging procedure. We also offer easy-to-use open source software that implements all the methods described herein. Word count: 8488 (Excludes supplementary appendices to appear on the web) ∗For helpful comments, discussions, and data we thank Walter Mebane, Gretchen Casper, Simone Dietrich, Justin Grimmer, Sunshine Hillygus, Burt Monroe, Adam Nye, Michael Peress, Eric Plutzer, Mo Tavano, Shawn Treier, Joseph Wright and Chris Zorn. †Assistant Professor, Department of Political Science, University of Rochester, 322 Harkness Hall, Rochester, NY 14627 (m.blackwell@rochester.edu, http://www.mattblackwell.org) ‡Senior Research Scientist, Institute for Quantitative Social Science, 1737 Cambridge Street, Cambridge, MA 02138 (jhonaker@iq.harvard.edu, http://scholar.harvard.edu/honaker) §Albert J. Weatherhead III University Professor, Harvard University, Institute for Quantitative Social Science, 1737 Cambridge Street, Cambridge, MA 02138 (king@harvard.edu, http://gking.harvard.edu)

[1]  Gretchen Casper,et al.  Correlation Versus Interchangeability: The Limited Robustness of Empirical Findings on Democracy Using Highly Correlated Data Sets , 2003, Political Analysis.

[2]  R. Huckfeldt,et al.  Alternative Contexts of Political Behavior: Churches, Neighborhoods, and Individuals , 1993, The Journal of Politics.

[3]  David Brownstone,et al.  Modeling Earnings Measurement Error: A Multiple Imputation Approach , 1996 .

[4]  D. Ruppert,et al.  Measurement Error in Nonlinear Models , 1995 .

[5]  Sander Greenland,et al.  Multiple-imputation for measurement-error correction. , 2006, International journal of epidemiology.

[6]  Raymond J. Carroll,et al.  Approximate Quasi-likelihood Estimation in Models with Surrogate Predictors , 1990 .

[7]  Snigdhansu Chatterjee,et al.  Structural Equation Modeling, A Bayesian Approach , 2008, Technometrics.

[8]  G. King,et al.  Analyzing Incomplete Political Science Data: An Alternative Algorithm for Multiple Imputation , 2001, American Political Science Review.

[9]  Alexander Kukush,et al.  Measurement Error Models , 2011, International Encyclopedia of Statistical Science.

[10]  G. King,et al.  What to Do about Missing Values in Time‐Series Cross‐Section Data , 2010 .

[11]  Ian R White,et al.  Commentary: dealing with measurement error: multiple imputation or regression calibration? , 2006, International journal of epidemiology.

[12]  Gary King,et al.  Improving Anchoring Vignettes Designing Surveys to Correct Interpersonal Incomparability , 2010 .

[13]  Jonathan N. Katz,et al.  Correcting for Survey Misreports Using Auxiliary Information with an Application to Estimating Turnout , 2008 .

[14]  Raymond J Carroll,et al.  A comparison of regression calibration, moment reconstruction and imputation for adjusting for covariate measurement error in regression , 2008, Statistics in medicine.

[15]  Teppei Yamamoto,et al.  Causal Inference with Differential Measurement Error: Nonparametric Identification and Sensitivity Analysis , 2010 .

[16]  S. I. Bityukov,et al.  STATISTICALLY DUAL DISTRIBUTIONS IN STATISTICAL INFERENCE , 2006 .

[17]  Edward E. Leamer,et al.  Consistent Sets of Estimates for Regressions with Errors in All Variables , 1984 .

[18]  James O. Berger,et al.  An overview of robust Bayesian analysis , 1994 .

[19]  David E. Booth,et al.  Analysis of Incomplete Multivariate Data , 2000, Technometrics.

[20]  Gary King,et al.  Toward a Common Framework for Statistical Analysis and Development , 2008 .

[21]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[22]  Gary King,et al.  Estimating risk and rate levels, ratios and differences in case‐control studies , 2002, Statistics in medicine.

[23]  J. Schafer,et al.  Multiple Edit/Multiple Imputation for Multivariate Continuous Data , 2003 .

[24]  Xiao-Li Meng,et al.  Multiple-Imputation Inferences with Uncongenial Sources of Input , 1994 .

[25]  James M. Robins,et al.  Large-sample theory for parametric multiple imputation procedures , 1998 .

[26]  Gary King,et al.  Amelia II: A Program for Missing Data , 2011 .

[27]  Jason Wittenberg,et al.  Making the Most Of Statistical Analyses: Improving Interpretation and Presentation , 2000 .

[28]  Donald B. Rubin,et al.  Bayesian Inference for Causal Effects: The Role of Randomization , 1978 .

[29]  Stephen Ansolabehere,et al.  The Strength of Issues: Using Multiple Measures to Gauge Preference Stability, Ideological Constraint, and Issue Voting , 2008, American Political Science Review.

[30]  J L Schafer,et al.  Multiple Imputation for Multivariate Missing-Data Problems: A Data Analyst's Perspective. , 1998, Multivariate behavioral research.

[31]  Annamaria Guolo,et al.  Robust techniques for measurement error correction: a review , 2008, Statistical methods in medical research.

[32]  J. R. Cook,et al.  Simulation-Extrapolation Estimation in Parametric Measurement Error Models , 1994 .

[33]  M. Berger,et al.  Bounding Parameter Estimates with Nonclassical Measurement Error , 2000 .