Are All Biases Missing Data Problems?

Estimating causal effects is a frequent goal of epidemiologic studies. Traditionally, there have been three established systematic threats to consistent estimation of causal effects. These three threats are bias due to confounders, selection, and measurement error. Confounding, selection, and measurement bias have typically been characterized as distinct types of biases. However, each of these biases can also be characterized as missing data problems that can be addressed with missing data solutions. Here we describe how the aforementioned systematic threats arise from missing data as well as review methods and their related assumptions for reducing each bias type. We also link the assumptions made by the reviewed methods to the missing completely at random (MCAR) and missing at random (MAR) assumptions made in the missing data framework that allow for valid inferences to be made based on the observed, incomplete data.

[1]  S. Cole,et al.  Causal inference in occupational epidemiology: accounting for the healthy worker effect by using structural nested models. , 2013, American journal of epidemiology.

[2]  Jennifer L. Hill,et al.  Bayesian Nonparametric Modeling for Causal Inference , 2011 .

[3]  J. Robins,et al.  Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. , 2000, Epidemiology.

[4]  James M. Robins,et al.  Principal stratification designs to estimate input data missing due to death - Discussion , 2007 .

[5]  Til Stürmer,et al.  Adjusting effect estimates for unmeasured confounding with validation data using propensity score calibration. , 2005, American journal of epidemiology.

[6]  J. Robins,et al.  Marginal Structural Models and Causal Inference in Epidemiology , 2000, Epidemiology.

[7]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[8]  Eric J Tchetgen Tchetgen,et al.  Accounting for Bias Due to Selective Attrition: The Example of Smoking and Cognitive Decline , 2012, Epidemiology.

[9]  S. Cole,et al.  American Journal of Epidemiology Practice of Epidemiology Limitation of Inverse Probability-of-censoring Weights in Estimating Survival in the Presence of Strong Selection Bias , 2022 .

[10]  B Rosner,et al.  Regression calibration method for correcting measurement-error bias in nutritional epidemiology. , 1997, The American journal of clinical nutrition.

[11]  James M. Robins,et al.  Adjusting for Nonignorable Drop-Out Using Semiparametric Nonresponse Models: Rejoinder , 1999 .

[12]  D. Rubin,et al.  Principal Stratification in Causal Inference , 2002, Biometrics.

[13]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[14]  Basile Chaix,et al.  Commentary: Weighing up the dead and missing: reflections on inverse-probability weighting and principal stratification to address truncation by death. , 2012, Epidemiology.

[15]  S. Cole,et al.  Estimating the Effect of Cumulative Occupational Asbestos Exposure on Time to Lung Cancer Mortality: Using Structural Nested Failure-Time Models to Account for Healthy-Worker Survivor Bias , 2014, Epidemiology.

[16]  J. Robins Estimation of the time-dependent accelerated failure time model in the presence of confounding factors , 1992 .

[17]  Tyler J VanderWeele,et al.  Principal Stratification -- Uses and Limitations , 2011, The international journal of biostatistics.

[18]  Dylan S. Small,et al.  War and Wages , 2008 .

[19]  S. Cole,et al.  Invited Commentary: Causal diagrams and measurement bias. , 2009, American journal of epidemiology.

[20]  Sander Greenland,et al.  Sensitivity analysis of misclassification: a graphical and a Bayesian approach. , 2006, Annals of epidemiology.

[21]  D. Berry,et al.  Statistical models in epidemiology, the environment, and clinical trials , 2000 .

[22]  M. Hudgens,et al.  Comparing competing risk outcomes within principal strata, with application to studies of mother‐to‐child transmission of HIV , 2012, Statistics in medicine.

[23]  David Canning,et al.  Correcting HIV Prevalence Estimates for Survey Nonparticipation Using Heckman-type Selection Models , 2011, Epidemiology.

[24]  J. Robins A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect , 1986 .

[25]  Sebastian Schneeweiss,et al.  The International Journal of Biostatistics Preference-Based Instrumental Variable Methods for the Estimation of Treatment Effects : Assessing Validity and Interpreting Results , 2011 .

[26]  Dylan S. Small,et al.  Sensitivity Analysis for Instrumental Variables Regression With Overidentifying Restrictions , 2007 .

[27]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[28]  Stephen R Cole,et al.  Constructing inverse probability weights for marginal structural models. , 2008, American journal of epidemiology.

[29]  M. J. van der Laan,et al.  Targeted Minimum Loss Based Estimation of Causal Effects of Multiple Time Point Interventions , 2012, The international journal of biostatistics.

[30]  J. Robins,et al.  G-Estimation of the Effect of Prophylaxis Therapy for Pneumocystis carinii Pneumonia on the Survival of AIDS Patients , 1992, Epidemiology.

[31]  Donald B. Rubin,et al.  Bayesian Inference for Causal Effects: The Role of Randomization , 1978 .

[32]  D. Rubin,et al.  Statistical Analysis with Missing Data , 1988 .

[33]  M. Albert,et al.  Impact of differential attrition on the association of education with cognitive change over 20 years of follow-up: the ARIC neurocognitive study. , 2014, American journal of epidemiology.

[34]  David Canning,et al.  National HIV prevalence estimates for sub-Saharan Africa: controlling selection bias with Heckman-type selection models , 2012, Sexually Transmitted Infections.

[35]  Mark Lunt,et al.  Propensity score calibration in the absence of surrogacy. , 2012, American journal of epidemiology.

[36]  Frank Windmeijer,et al.  COX-2 Selective Nonsteroidal Anti-inflammatory Drugs and Risk of Gastrointestinal Tract Complications and Myocardial Infarction: An Instrumental Variable Analysis , 2013, Epidemiology.

[37]  M Alan Brookhart,et al.  Evaluating Short-Term Drug Effects Using a Physician-Specific Prescribing Preference as an Instrumental Variable , 2006, Epidemiology.

[38]  J. Robins,et al.  Intervening on risk factors for coronary heart disease: an application of the parametric g-formula. , 2009, International journal of epidemiology.

[39]  E. Stuart,et al.  Impacts of age of onset of substance use disorders on risk of adult incarceration among disadvantaged urban youth: a propensity score matching approach. , 2008, Drug and alcohol dependence.

[40]  J. Robins,et al.  Sensitivity Analyses for Unmeasured Confounding Assuming a Marginal Structural Model for Repeated Measures , 2022 .

[41]  J. Avorn,et al.  Adjustments for Unmeasured Confounders in Pharmacoepidemiologic Database Studies Using External Information , 2007, Medical care.

[42]  Ilya Shpitser,et al.  Rejoinder: To Weight or Not to Weight? On the Relation Between Inverse-probability Weighting and Principal Stratification for Truncation by Death , 2012 .

[43]  Onyebuchi A Arah,et al.  Bias Formulas for Sensitivity Analysis of Unmeasured Confounding for General Outcomes, Treatments, and Confounders , 2011, Epidemiology.

[44]  Romain Neugebauer,et al.  Targeted learning in real‐world comparative effectiveness research with time‐varying interventions , 2014, Statistics in medicine.

[45]  J. Neuhaus Bias and efficiency loss due to misclassified responses in binary regression , 1999 .

[46]  J. Robins,et al.  Specifying the correlation structure in inverse-probability- weighting estimation for repeated measures. , 2012, Epidemiology.

[47]  Wiebe R. Pestman,et al.  Instrumental Variables: Application and Limitations , 2006, Epidemiology.

[48]  G. Bedogni Applying Quantitative Bias Analysis to Epidemiologic Data , 2011 .

[49]  Sander Greenland,et al.  Multiple-imputation for measurement-error correction. , 2006, International journal of epidemiology.

[50]  N. Best,et al.  Adjusting for Selection Effects in Epidemiologic Studies: Why Sensitivity Analysis is the Only “Solution” , 2011, Epidemiology.

[51]  Judea Pearl,et al.  On the Consistency Rule in Causal Inference: Axiom, Definition, Assumption, or Theorem? , 2010, Epidemiology.

[52]  J. Robins,et al.  Doubly Robust Estimation in Missing Data and Causal Inference Models , 2005, Biometrics.

[53]  Dylan S. Small,et al.  Building a Stronger Instrument in an Observational Study of Perinatal Care for Premature Infants , 2010 .

[54]  Heejung Bang,et al.  Bias Correction Methods for Misclassified Covariates in the Cox Model: Comparison of Five Correction Methods by Simulation and Data Analysis , 2013, Journal of statistical theory and practice.

[55]  L. Magder,et al.  Logistic regression when the outcome is measured with uncertainty. , 1997, American journal of epidemiology.

[56]  J. Robins,et al.  Sensitivity Analysis for Selection bias and unmeasured Confounding in missing Data and Causal inference models , 2000 .

[57]  James M. Robins,et al.  Estimation of the failure time distribution in the presence of informative censoring , 2002 .

[58]  H. Malani,et al.  A modification of the redistribution to the right algorithm using disease markers , 1995 .

[59]  Bryan Lau,et al.  Parametric mixture models to evaluate and summarize hazard ratios in the presence of competing risks with time‐dependent hazards and delayed entry , 2011, Statistics in medicine.

[60]  J. Robins Errata to “a new approach to causal intefence in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect” Mathl Modelling 7(9–12), 1393–1512 (1986) , 1987 .

[61]  J. Robins,et al.  A Structural Approach to Selection Bias , 2004, Epidemiology.

[62]  J. Robins,et al.  Adjusting for Nonignorable Drop-Out Using Semiparametric Nonresponse Models , 1999 .

[63]  J. Robins,et al.  Instruments for Causal Inference: An Epidemiologist's Dream? , 2006, Epidemiology.

[64]  A. Olshan,et al.  Bayesian Methods for Correcting Misclassification: An Example from Birth Defects Epidemiology , 2009, Epidemiology.

[65]  M. Kenward,et al.  Using causal diagrams to guide analysis in missing data problems , 2012, Statistical methods in medical research.

[66]  Tyler J. VanderWeele,et al.  Concerning the consistency assumption in causal inference. , 2009, Epidemiology.

[67]  A. Tsiatis,et al.  Nonparametric survival estimation using prognostic longitudinal covariates. , 1996, Biometrics.

[68]  S. Cole,et al.  Competing risk regression models for epidemiologic data. , 2009, American journal of epidemiology.

[69]  Robert H Lyles,et al.  Validation Data-based Adjustments for Outcome Misclassification in Logistic Regression: An Illustration , 2011, Epidemiology.

[70]  Sander Greenland,et al.  Modern Epidemiology 3rd edition , 1986 .

[71]  Miguel A. Hernán,et al.  Observation plans in longitudinal studies with time-varying treatments , 2009, Statistical methods in medical research.

[72]  Dylan S. Small,et al.  War and Wages : The Strength of Instrumental Variables and Their Sensitivity to Unobserved Biases , 2007 .

[73]  Stephen R Cole,et al.  The Parametric g-Formula for Time-to-event Data: Intuition and a Worked Example , 2014, Epidemiology.

[74]  D Scharfstein,et al.  Inference in Randomized Studies with Informative Censoring and Discrete Time‐to‐Event Endpoints , 2001, Biometrics.

[75]  Eric J Tchetgen Tchetgen,et al.  Methodological Challenges in Mendelian Randomization , 2014, Epidemiology.

[76]  Donald Rubin,et al.  Estimating Causal Effects from Large Data Sets Using Propensity Scores , 1997, Annals of Internal Medicine.

[77]  Timothy L. Lash,et al.  Applying Quantitative Bias Analysis to Epidemiologic Data , 2009, Statistics for Biology and Health.

[78]  Elizabeth A Stuart,et al.  Matching methods for causal inference: A review and a look forward. , 2010, Statistical science : a review journal of the Institute of Mathematical Statistics.

[79]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[80]  James M. Robins,et al.  Causal Inference from Complex Longitudinal Data , 1997 .

[81]  Stephen R Cole,et al.  The consistency statement in causal inference: a definition or an assumption? , 2009, Epidemiology.

[82]  D. Canning,et al.  Using interviewer random effects to remove selection bias from HIV prevalence estimates , 2015, BMC Medical Research Methodology.

[83]  Stephen R Cole,et al.  All your data are always missing: incorporating bias due to measurement error into the potential outcomes framework. , 2015, International journal of epidemiology.

[84]  Chiu-Hsieh Hsu,et al.  Survival analysis using auxiliary variables via non‐parametric multiple imputation , 2006, Statistics in medicine.

[85]  M. Hernán,et al.  Compound Treatments and Transportability of Causal Inference , 2011, Epidemiology.

[86]  Maia Berkane Latent Variable Modeling and Applications to Causality , 1997 .

[87]  Miguel A Hernán,et al.  Commentary: how to report instrumental variable analyses (suggestions welcome). , 2013, Epidemiology.

[88]  Michele Jonsson Funk,et al.  Misclassification in Administrative Claims Data: Quantifying the Impact on Treatment Effect Estimates , 2014, Current Epidemiology Reports.

[89]  Michelle Shardell,et al.  Doubly robust estimation and causal inference in longitudinal studies with dropout and truncation by death. , 2015, Biostatistics.