Selection Bias When Estimating Average Treatment Effects Using One-sample Instrumental Variable Analysis

Participants in epidemiologic and genetic studies are rarely true random samples of the populations they are intended to represent, and both known and unknown factors can influence participation in a study (known as selection into a study). The circumstances in which selection causes bias in an instrumental variable (IV) analysis are not widely understood by practitioners of IV analyses. We use directed acyclic graphs (DAGs) to depict assumptions about the selection mechanism (factors affecting selection) and show how DAGs can be used to determine when a two-stage least squares IV analysis is biased by different selection mechanisms. Through simulations, we show that selection can result in a biased IV estimate with substantial confidence interval (CI) undercoverage, and the level of bias can differ between instrument strengths, a linear and nonlinear exposure–instrument association, and a causal and noncausal exposure effect. We present an application from the UK Biobank study, which is known to be a selected sample of the general population. Of interest was the causal effect of staying in school at least 1 extra year on the decision to smoke. Based on 22,138 participants, the two-stage least squares exposure estimates were very different between the IV analysis ignoring selection and the IV analysis which adjusted for selection (e.g., risk differences, 1.8% [95% CI, −1.5%, 5.0%] and −4.5% [95% CI, −6.6%, −2.4%], respectively). We conclude that selection bias can have a major effect on an IV analysis, and further research is needed on how to conduct sensitivity analyses when selection depends on unmeasured data.

[1]  A. Cameron,et al.  Microeconometrics: Methods and Applications , 2005 .

[2]  D. Lawlor,et al.  Cohort Profile: The Avon Longitudinal Study of Parents and Children: ALSPAC mothers cohort , 2012, International journal of epidemiology.

[3]  M. Hernán,et al.  Nature as a Trialist?: Deconstructing the Analogy Between Mendelian Randomization and Randomized Trials. , 2017, Epidemiology.

[4]  George Davey Smith,et al.  Mendelian randomization: Using genes as instruments for making causal inferences in epidemiology , 2008, Statistics in medicine.

[5]  M. Glymour,et al.  USING CAUSAL DIAGRAMS TO UNDERSTAND COMMON PROBLEMS IN SOCIAL EPIDEMIOLOGY , 2006 .

[6]  Ian R White,et al.  On the use of the not‐at‐random fully conditional specification (NARFCS) procedure in practice , 2018, Statistics in medicine.

[7]  Damon Clark,et al.  The Effect of Education on Adult Mortality and Health: Evidence from Britain. , 2013, The American economic review.

[8]  M. Feinleib,et al.  A positive or a negative confounding variable? A simple teaching aid for clinicians and students. , 2005, Annals of epidemiology.

[9]  James M Robins,et al.  Selecting on treatment: a pervasive form of bias in instrumental variable analyses. , 2015, American journal of epidemiology.

[10]  Miguel A Hernán,et al.  Commentary: how to report instrumental variable analyses (suggestions welcome). , 2013, Epidemiology.

[11]  Christopher F. Baum,et al.  Instrumental Variables and GMM: Estimation and Testing , 2003 .

[12]  S. Thompson,et al.  Bias in causal estimates from Mendelian randomization studies with weak instruments , 2011, Statistics in medicine.

[13]  J. Gallacher,et al.  Why representativeness should be avoided. , 2013, International journal of epidemiology.

[14]  J. Robins,et al.  American Journal of Epidemiology Practice of Epidemiology Credible Mendelian Randomization Studies: Approaches for Evaluating the Instrumental Variable Assumptions , 2022 .

[15]  Catherine R. Lesko,et al.  Instrumental Variable Analyses and Selection Bias , 2017, Epidemiology.

[16]  C. Sudlow,et al.  Comparison of Sociodemographic and Health-Related Characteristics of UK Biobank Participants With Those of the General Population , 2017, American journal of epidemiology.

[17]  Olaf H. Klungel,et al.  Instrumental Variable Analysis in Epidemiologic Studies: An Overview of the Estimation Methods , 2015 .

[18]  P. Elliott,et al.  UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age , 2015, PLoS medicine.

[19]  Magne Mogstad,et al.  Instrumental Variables Estimation with Partially Missing Instruments , 2010, SSRN Electronic Journal.

[20]  J. Robins,et al.  Instruments for Causal Inference: An Epidemiologist's Dream? , 2006, Epidemiology.

[21]  Henrik Toft Sørensen,et al.  Comment on "Perils and potentials of self-selected entry to epidemiological studies and surveys" , 2016 .

[22]  S. Swanson Instrumental Variable Analyses in Pharmacoepidemiology: What Target Trials Do We Emulate? , 2017, Current Epidemiology Reports.

[23]  Michael G Hudgens,et al.  Generalizing Study Results: A Potential Outcomes Perspective. , 2017, Epidemiology.

[24]  G. Shaw,et al.  Maternal pesticide exposure from multiple sources and selected congenital anomalies. , 1999 .

[25]  S. Greenland An introduction To instrumental variables for epidemiologists , 2000, International journal of epidemiology.

[26]  Gerard J. van den Berg,et al.  The Causal Effects of Education on Health Outcomes in the UK Biobank , 2017, Nature Human Behaviour.

[27]  Ian Shrier,et al.  Reducing bias through directed acyclic graphs , 2008 .

[28]  S. Cole,et al.  Illustrating bias due to conditioning on a collider. , 2010, International journal of epidemiology.

[29]  J. Robins Analytic Methods for Estimating HIV-Treatment and Cofactor Effects , 2002 .

[30]  C. Harmon,et al.  Estimates of the Economic Return to Schooling for the United Kingdom , 1995 .

[31]  Neil M Davies,et al.  Issues in the reporting and conduct of instrumental variable studies: a systematic review. , 2013, Epidemiology.

[32]  Patrick Royston,et al.  Avoiding bias due to perfect prediction in multiple imputation of incomplete categorical variables☆ , 2010, Comput. Stat. Data Anal..

[33]  S. Greenland Quantifying Biases in Causal Models: Classical Confounding vs Collider-Stratification Bias , 2003, Epidemiology.

[34]  I. White,et al.  Review of inverse probability weighting for dealing with missing data , 2013, Statistical methods in medical research.

[35]  Ian R. White,et al.  Simsum: Analyses of Simulation Studies Including Monte Carlo Error , 2010 .

[36]  Ofer Harel,et al.  Asymptotically Unbiased Estimation of Exposure Odds Ratios in Complete Records Logistic Regression , 2015, American journal of epidemiology.

[37]  Dylan S. Small,et al.  Selection Bias When Using Instrumental Variable Methods to Compare Two Treatments But More Than Two Treatments Are Available , 2015, The international journal of biostatistics.

[38]  Dan Su,et al.  Graphical Models for Quasi-experimental Designs , 2017, Sociological methods & research.

[39]  S. Burgess,et al.  Contextualizing selection bias in Mendelian randomization: how bad is it likely to be? , 2018, International journal of epidemiology.

[40]  J. Robins,et al.  A Structural Approach to Selection Bias , 2004, Epidemiology.

[41]  Judea Pearl,et al.  On a Class of Bias-Amplifying Variables that Endanger Effect Estimates , 2010, UAI.

[42]  Dylan S. Small,et al.  Estimation of causal effects using instrumental variables with nonignorable missing covariates: Application to effect of type of delivery NICU on premature infants , 2014, 1404.2477.

[43]  M. Munafo,et al.  Investigating causality in associations between education and smoking: A two-sample Mendelian randomization study , 2017, bioRxiv.

[44]  Joshua D. Angrist,et al.  Identification of Causal Effects Using Instrumental Variables , 1993 .

[45]  Instrumental Variable Methods for Continuous Outcomes That Accommodate Nonignorable Missing Baseline Values. , 2017, American journal of epidemiology.

[46]  M. Baiocchi,et al.  Instrumental variable methods for causal inference , 2014, Statistics in medicine.

[47]  Judea Pearl,et al.  Causal Inference , 2010 .

[48]  I. White,et al.  Inverse Probability Weighting with Missing Predictors of Treatment Assignment or Missingness , 2014 .

[49]  J M Robins,et al.  Correction for non-compliance in equivalence trials. , 1998, Statistics in medicine.

[50]  M. Hernán Invited Commentary: Selection Bias Without Colliders. , 2017, American journal of epidemiology.

[51]  Ashkan Ertefaie,et al.  A tutorial on the use of instrumental variables in pharmacoepidemiology , 2017, Pharmacoepidemiology and drug safety.

[52]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[53]  A. G. Boef,et al.  Mendelian randomization studies in the elderly. , 2015, Epidemiology.

[54]  H. Graham,et al.  Changing social gradients in cigarette smoking and cessation over two decades of adult follow-up in a British birth cohort. , 2004, Journal of public health.

[55]  Prabhat Jha,et al.  Avoidable global cancer deaths and total deaths from smoking , 2009, Nature Reviews Cancer.

[56]  S. Norby [Mendelian randomization]. , 2005, Ugeskrift for laeger.