A two‐sample robust Bayesian Mendelian Randomization method accounting for linkage disequilibrium and idiosyncratic pleiotropy with applications to the COVID‐19 outcomes

Mendelian randomization (MR) is a statistical method exploiting genetic variants as instrumental variables to estimate the causal effect of modifiable risk factors on an outcome of interest. Despite wide uses of various popular two-sample MR methods based on genome-wide association study summary level data, however, those methods could suffer from potential power loss or/and biased inference when the chosen genetic variants are in linkage disequilibrium (LD), and have relatively large direct effects on the outcome whose distribution might be heavy-tailed which is commonly referred to as the idiosyncratic pleiotropy. To resolve those two issues, we propose a novel Robust Bayesian Mendelian Randomization (RBMR) model that uses the more robust multivariate generalized t-distribution to model such direct effects in a probabilistic model framework which can also incorporate the LD structure explicitly. The generalized t-distribution can be represented as a Gaussian scaled mixture so that our model parameters can be estimated by the EM-type algorithms. We compute the standard errors by calibrating the evidence lower bound (ELBO) using the likelihood ratio test. Through extensive simulation studies, we show that our RBMR has robust performance compared to other competing methods. We also apply our RBMR method to two benchmark data sets and find that RBMR has smaller bias and standard errors. Using our proposed RBMR method, we found that coronary artery disease (CAD) is associated with increased risk of coronavirus disease 2019 (COVID-19). We also develop a user-friendly R package RBMR for public use.

[1]  Helen E. Parkinson,et al.  The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog) , 2016, Nucleic Acids Res..

[2]  Hashem A. Shihab,et al.  MR-Base: a platform for systematic causal inference across the phenome using billions of genetic associations , 2016, bioRxiv.

[3]  Mattia G. Bergomi,et al.  Mapping the human genetic architecture of COVID-19 , 2021, Nature.

[4]  Mark I. McCarthy,et al.  A genome-wide association study in Europeans and South Asians identifies five new loci for coronary artery disease , 2011, Nature Genetics.

[5]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[6]  D. Rubin,et al.  Parameter expansion to accelerate EM : The PX-EM algorithm , 1997 .

[7]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[8]  Heng Peng,et al.  MR-LDP: a two-sample Mendelian randomization for GWAS summary statistics accounting for linkage disequilibrium and horizontal pleiotropy , 2020, NAR genomics and bioinformatics.

[9]  Can Yang,et al.  Bayesian weighted Mendelian randomization for causal inference based on summary statistics , 2018, Bioinform..

[10]  The COVID-19 Host Genetics Initiative The COVID-19 Host Genetics Initiative, a global initiative to elucidate the role of host genetic factors in susceptibility and severity of the SARS-CoV-2 virus pandemic , 2020, European Journal of Human Genetics.

[11]  David A. Jaeger,et al.  Problems with Instrumental Variables Estimation when the Correlation between the Instruments and the Endogenous Explanatory Variable is Weak , 1995 .

[12]  Joseph K. Pickrell,et al.  Approximately independent linkage disequilibrium blocks in human populations , 2015, bioRxiv.

[13]  Robert Piché,et al.  Gaussian Scale Mixture Models for Robust Linear Multivariate Regression with Missing Data , 2016, Commun. Stat. Simul. Comput..

[14]  George Davey Smith,et al.  Mendelian randomization: Using genes as instruments for making causal inferences in epidemiology , 2008, Statistics in medicine.

[15]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[16]  Christian Hansen,et al.  Estimation With Many Instrumental Variables , 2006, Journal of Business & Economic Statistics.

[17]  S. Ebrahim,et al.  Mendelian randomization: can genetic epidemiology help redress the failures of observational epidemiology? , 2008, Human Genetics.

[18]  Jin Liu,et al.  CoMM-S2: a collaborative mixed model using summary statistics in transcriptome-wide association studies. , 2019, Bioinformatics.

[19]  Ross M. Fraser,et al.  Genetic studies of body mass index yield new insights for obesity biology , 2015, Nature.

[20]  Jing Zhao,et al.  Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus–Infected Pneumonia , 2020, The New England journal of medicine.

[21]  David M. Evans,et al.  Mendelian Randomization: New Applications in the Coming Age of Hypothesis-Free Causality. , 2015, Annual review of genomics and human genetics.

[22]  G. Davey Smith,et al.  Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression , 2015, International journal of epidemiology.

[23]  Wiebe R. Pestman,et al.  Instrumental Variables: Application and Limitations , 2006, Epidemiology.

[24]  Samuel Kotz,et al.  Multivariate T-Distributions and Their Applications , 2004 .

[25]  B. Neale,et al.  Detection of widespread horizontal pleiotropy in causal relationships inferred from Mendelian randomization between complex traits and diseases , 2018, Nature Genetics.

[26]  Xiang Zhu,et al.  Bayesian large-scale multiple regression with summary statistics from genome-wide association studies , 2016, bioRxiv.

[27]  Adam J. Rothman Positive definite estimators of large covariance matrices , 2012 .

[28]  The COVID-19 Host Genetics Initiative, a global initiative to elucidate the role of host genetic factors in susceptibility and severity of the SARS-CoV-2 virus pandemic , 2020, European Journal of Human Genetics.

[29]  A. Butterworth,et al.  Mendelian Randomization Analysis With Multiple Genetic Variants Using Summarized Data , 2013, Genetic epidemiology.

[30]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[31]  N. Mehta Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. , 2011, Circulation. Cardiovascular genetics.

[32]  Sarah E. Medland,et al.  Mining the Human Phenome Using Allelic Scores That Index Biological Intermediates , 2013, PLoS genetics.

[33]  Yi Yang,et al.  CoMM-S2: a collaborative mixed model using summary statistics in transcriptome-wide association studies , 2019, bioRxiv.

[34]  Robert M. Maier,et al.  Causal associations between risk factors and common diseases inferred from GWAS summary data , 2017, Nature Communications.

[35]  Dylan S. Small,et al.  Statistical inference in two-sample summary-data Mendelian randomization using robust adjusted profile score , 2018, The Annals of Statistics.

[36]  Tanya M. Teslovich,et al.  Biological, Clinical, and Population Relevance of 95 Loci for Blood Lipids , 2010, Nature.

[37]  Tom R. Gaunt,et al.  Genetic Variants in Novel Pathways Influence Blood Pressure and Cardiovascular Disease Risk , 2011, Nature.

[38]  Jian Huang,et al.  LPG: A four-group probabilistic approach to leveraging pleiotropy in genome-wide association studies , 2018, BMC Genomics.

[39]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[40]  Bo Wang,et al.  Inadequacy of interval estimates corresponding to variational Bayesian approximations , 2005, AISTATS.

[41]  Gabriel Frahm Generalized Elliptical Distributions: Theory and Applications , 2004 .

[42]  R. Arellano-Valle,et al.  On some characterizations of the t-distribution , 1995 .

[43]  Jin Liu,et al.  IGESS: a statistical approach to integrating individual‐level genotype data and summary statistics in genome‐wide association studies , 2017, Bioinform..

[44]  S. Purcell,et al.  Pleiotropy in complex traits: challenges and strategies , 2013, Nature Reviews Genetics.

[45]  Joseph K. Pickrell,et al.  Detection and interpretation of shared genetic influences on 42 human traits , 2015, Nature Genetics.

[46]  B. Thomson The COVID-19 Pandemic: A Global Natural Experiment. , 2020, Circulation.

[47]  Nick C Fox,et al.  Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer's disease , 2013, Nature Genetics.

[48]  Tanya M. Teslovich,et al.  Discovery and refinement of loci associated with lipid levels , 2013, Nature Genetics.