Severity of bias of a simple estimator of the causal odds ratio in Mendelian randomization studies

Mendelian randomization studies estimate causal effects using genetic variants as instruments. Instrumental variable methods are straightforward for linear models, but epidemiologists often use odds ratios to quantify effects. Also, odds ratios are often the quantities reported in meta-analyses. Many applications of Mendelian randomization dichotomize genotype and estimate the population causal log odds ratio for unit increase in exposure by dividing the genotype-disease log odds ratio by the difference in mean exposure between genotypes. This 'Wald-type' estimator is biased even in large samples, but whether the magnitude of bias is of practical importance is unclear. We study the large-sample bias of this estimator in a simple model with a continuous normally distributed exposure, a single unobserved confounder that is not an effect modifier, and interpretable parameters. We focus on parameter values that reflect scenarios in which we apply Mendelian randomization, including realistic values for the degree of confounding and strength of the causal effect. We evaluate this estimator and the causal odds ratio using numerical integration and obtain approximate analytic expressions to check results and gain insight. A small simulation study examines finite sample bias and mild violations of the normality assumption. For our simple data-generating model, we find that the Wald estimator is asymptotically biased with a bias of around 10% in fairly typical Mendelian randomization scenarios but which can be larger in more extreme situations. Recently developed methods such as structural mean models require fewer untestable assumptions and we recommend their use when the individual-level data they require are available. The Wald-type estimator may retain a role as an approximate method for meta-analysis based on summary data.

[1]  S. Wild,et al.  Bayesian methods for instrumental variable analysis with genetic instruments (‘Mendelian randomization’): example with urate transporter SLC2A9 as an instrumental variable for effect of urate levels on metabolic syndrome , 2010, International journal of epidemiology.

[2]  J. Mullahy Instrumental-Variable Estimation of Count Data Models: Applications to Models of Cigarette Smoking Behavior , 1997, Review of Economics and Statistics.

[3]  N. Timpson,et al.  Does Greater Adiposity Increase Blood Pressure and Hypertension Risk?: Mendelian Randomization Using the FTO/MC4R Genotype , 2009, Hypertension.

[4]  S. Ebrahim,et al.  Mendelian randomization: prospects, potentials, and limitations. , 2004, International journal of epidemiology.

[5]  F. Hu,et al.  Interleukin-6 Receptor Gene Variations, Plasma Interleukin-6 Levels, and Type 2 Diabetes in U.S. Women , 2007, Diabetes.

[6]  James J. Heckman,et al.  Identification of Causal Effects Using Instrumental Variables: Comment , 1996 .

[7]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[8]  Alberto Abadie Semiparametric instrumental variable estimation of treatment response models , 2003 .

[9]  Stephen Burgess,et al.  Bayesian methods for meta‐analysis of causal relationships estimated using genetic instrumental variables , 2010, Statistics in medicine.

[10]  Sander Greenland,et al.  Modern Epidemiology 3rd edition , 1986 .

[11]  Sarah Parish,et al.  Plasma fibrinogen and fibrinogen genotypes in 4685 cases of myocardial infarction and in 6002 controls: Test of causality by "Mendelian randomisation" , 2000 .

[12]  Shah Ebrahim,et al.  Does Elevated Plasma Fibrinogen Increase the Risk of Coronary Heart Disease?: Evidence from a Meta-Analysis of Genetic Association Studies , 2005, Arteriosclerosis, thrombosis, and vascular biology.

[13]  M. Tobin,et al.  Mendelian Randomisation and Causal Inference in Observational Epidemiology , 2008, PLoS medicine.

[14]  W. R. Buckland,et al.  Distributions in Statistics: Continuous Multivariate Distributions , 1974 .

[15]  Vanessa Didelez,et al.  Assumptions of IV methods for observational epidemiology , 2010, 1011.0595.

[16]  J. Carter Sex hormone-binding globulin and risk of type 2 diabetes in women and men , 2010 .

[17]  J. Robins,et al.  Instruments for Causal Inference: An Epidemiologist's Dream? , 2006, Epidemiology.

[18]  R Peto,et al.  Association of fibrinogen, C-reactive protein, albumin, or leukocyte count with coronary heart disease: meta-analyses of prospective studies. , 1998, JAMA.

[19]  Niels Keiding,et al.  Graphical models for inference under outcome-dependent sampling , 2010, 1101.0901.

[20]  Huey-miin Hsueh,et al.  Tests for equivalence or non‐inferiority for paired binary data , 2002, Statistics in medicine.

[21]  R. P. McDonald,et al.  Principles and practice in reporting structural equation analyses. , 2002, Psychological methods.

[22]  J. Pearl Causal inference in statistics: An overview , 2009 .

[23]  A. Nichols IVPOIS: Stata module to estimate an instrumental variables Poisson regression via GMM , 2008 .

[24]  J. Angrist,et al.  Estimation of Limited Dependent Variable Models With Dummy Endogenous Regressors , 2001 .

[25]  N A Sheehan,et al.  On the choice of parameterisation and priors for the Bayesian analyses of Mendelian randomisation studies , 2012, Statistics in medicine.

[26]  N. Jewell,et al.  Some surprising results about covariate adjustment in logistic regression models , 1991 .

[27]  Judea Pearl,et al.  On the Testability of Causal Models With Latent and Instrumental Variables , 1995, UAI.

[28]  J. Robins,et al.  Marginal Structural Models and Causal Inference in Epidemiology , 2000, Epidemiology.

[29]  J. Pearl,et al.  Confounding and Collapsibility in Causal Inference , 1999 .

[30]  M. Hernán A definition of causal effect for epidemiological research , 2004, Journal of Epidemiology and Community Health.

[31]  Peter Spirtes,et al.  Introduction to Causal Inference , 2010, J. Mach. Learn. Res..

[32]  Richard D Riley,et al.  Meta‐analysis of genetic studies using Mendelian randomization—a multivariate approach , 2005, Statistics in medicine.

[33]  David A. Jaeger,et al.  Problems with Instrumental Variables Estimation when the Correlation between the Instruments and the Endogenous Explanatory Variable is Weak , 1995 .

[34]  M. Tobin,et al.  Meta‐analysis of Mendelian randomization studies incorporating all three genotypes , 2008, Statistics in medicine.

[35]  George Davey Smith,et al.  Mendelian randomization: Using genes as instruments for making causal inferences in epidemiology , 2008, Statistics in medicine.

[36]  Milton C. Chew Distributions in Statistics: Continuous Univariate Distributions-1 and 2 , 1971 .

[37]  James W. Hardin,et al.  Instrumental Variables, Bootstrapping, and Generalized Linear Models , 2003 .

[38]  Jeffrey M. Woodbridge Econometric Analysis of Cross Section and Panel Data , 2002 .

[39]  F. Windmeijer GMM for Panel Count Data Models , 2006 .

[40]  James Durbin,et al.  Errors in variables , 1954 .

[41]  Joshua D. Angrist,et al.  Identification of Causal Effects Using Instrumental Variables , 1993 .

[42]  J. Angrist,et al.  Digitized by the Internet Archive in 2011 with Funding from Estimation of Limited-dependent Variable Models with Dummy Endogenous Regressors: Simple Strategies for Empirical Practice , 2011 .

[43]  D. Lawlor,et al.  American Journal of Epidemiology Practice of Epidemiology Instrumental Variable Estimation of Causal Risk Ratios and Causal Odds Ratios in Mendelian Randomization Analyses , 2022 .

[44]  J. Robins,et al.  Estimating causal effects from epidemiological data , 2006, Journal of Epidemiology and Community Health.

[45]  S. Ebrahim,et al.  'Mendelian randomization': can genetic epidemiology contribute to understanding environmental determinants of disease? , 2003, International journal of epidemiology.

[46]  L. Smeeth,et al.  For Personal Use. Only Reproduce with Permission from Elsevier Ltd Homocysteine and Stroke: Evidence on a Causal Link from Mendelian Randomisation , 2022 .

[47]  Blai Bonet,et al.  A Calculus for Causal Relevance , 2001, UAI.

[48]  P. Albert,et al.  Models for longitudinal data: a generalized estimating equation approach. , 1988, Biometrics.

[49]  D. Wald,et al.  Homocysteine and cardiovascular disease: evidence on causality from a meta-analysis , 2002, BMJ : British Medical Journal.

[50]  N. Sheehan,et al.  Mendelian randomization as an instrumental variable approach to causal inference , 2007, Statistical methods in medical research.

[51]  M. Tobin,et al.  Adjusting for bias and unmeasured confounding in Mendelian randomization studies with binary responses. , 2008, International journal of epidemiology.

[52]  Tom R. Gaunt,et al.  The Association of C-Reactive Protein and CRP Genotype with Coronary Heart Disease: Findings from Five Studies with 4,610 Cases amongst 18,637 Participants , 2008, PloS one.

[53]  A. Wald The Fitting of Straight Lines if Both Variables are Subject to Error , 1940 .

[55]  S. Vansteelandt,et al.  On Instrumental Variables Estimation of Causal Odds Ratios , 2011, 1201.2487.

[56]  David V Conti,et al.  Commentary: the concept of 'Mendelian Randomization'. , 2004, International journal of epidemiology.

[57]  Stephen Burgess,et al.  Improving bias and coverage in instrumental variable analysis with weak instruments for continuous and binary outcomes , 2012, Statistics in medicine.

[58]  Shah Ebrahim,et al.  Association of C-Reactive Protein With Blood Pressure and Hypertension: Life Course Confounding and Mendelian Randomization Tests of Causality , 2005, Arteriosclerosis, thrombosis, and vascular biology.

[59]  Stijn Vansteelandt,et al.  Causal inference with generalized structural mean models , 2003 .