Selecting likely causal risk factors from high-throughput experiments using multivariable Mendelian randomization

Modern high-throughput experiments provide a rich resource to investigate causal determinants of disease risk. Mendelian randomization (MR) is the use of genetic variants as instrumental variables to infer the causal effect of a specific risk factor on an outcome. Multivariable MR is an extension of the standard MR framework to consider multiple potential risk factors in a single model. However, current implementations of multivariable MR use standard linear regression and hence perform poorly with many risk factors. Here, we propose a two-sample multivariable MR approach based on Bayesian model averaging (MR-BMA) that scales to high-throughput experiments. In a realistic simulation study, we show that MR-BMA can detect true causal risk factors even when the candidate risk factors are highly correlated. We illustrate MR-BMA by analysing publicly-available summarized data on metabolites to prioritise likely causal biomarkers for age-related macular degeneration. Multivariable Mendelian randomization (MR) extends the standard MR framework to consider multiple risk factors in a single model. Here, Zuber et al. propose MR-BMA, a Bayesian variable selection approach to identify the likely causal determinants of a disease from many candidate risk factors as for example high-throughput data sets.

[1]  M. Pirinen,et al.  Genome-wide study for circulating metabolites identifies 62 loci and reveals novel systemic effects of LPA , 2016, Nature Communications.

[2]  M. West,et al.  Shotgun Stochastic Search for “Large p” Regression , 2007 .

[3]  S. Thompson,et al.  Multivariable Mendelian Randomization: The Use of Pleiotropic Genetic Variants to Estimate Causal Effects , 2015, American journal of epidemiology.

[4]  Stephen Burgess,et al.  Combining information on multiple instrumental variables in Mendelian randomization: comparison of allele score and summarized data methods , 2015, Statistics in medicine.

[5]  Daniel Rueckert,et al.  Three-dimensional cardiovascular imaging-genetics: a mass univariate framework , 2017, Bioinform..

[6]  Stephen Burgess,et al.  Modal-based estimation via heterogeneity-penalized weighting: model averaging for consistent and efficient estimation in Mendelian randomization when a plurality of candidate instruments are valid , 2018, International journal of epidemiology.

[7]  N. Sheehan,et al.  A framework for the investigation of pleiotropy in two‐sample summary data Mendelian randomization , 2017, Statistics in medicine.

[8]  G. Davey Smith,et al.  Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression , 2015, International journal of epidemiology.

[9]  G. Davey Smith,et al.  Consistent Estimation in Mendelian Randomization with Some Invalid Instruments Using a Weighted Median Estimator , 2016, Genetic epidemiology.

[10]  Y. Teo,et al.  HDL-cholesterol levels and risk of age-related macular degeneration: a multiethnic genetic study using Mendelian randomization , 2017, International journal of epidemiology.

[11]  Giovanni Malerba,et al.  Refining the accuracy of validated target identification through coding variant fine-mapping in type 2 diabetes , 2017, Nature Genetics.

[12]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[13]  M. Stephens,et al.  Imputation-Based Analysis of Association Studies: Candidate Regions and Quantitative Traits , 2007, PLoS genetics.

[14]  Jessica M B Rees,et al.  Extending the MR‐Egger method for multivariable Mendelian randomization to correct for both measured and unmeasured pleiotropy , 2017, Statistics in medicine.

[15]  Jessica M B Rees,et al.  Dissecting Causal Pathways Using Mendelian Randomization with Summarized Genetic Data: Application to Age at Menarche and Risk of Breast Cancer , 2017, Genetics.

[16]  B. Pierce,et al.  Efficient Design for Mendelian Randomization Studies: Subsample and 2-Sample Instrumental Variable Estimators , 2013, American journal of epidemiology.

[17]  F. Cunningham,et al.  The Ensembl Variant Effect Predictor , 2016, Genome Biology.

[18]  Stephen Burgess,et al.  Genomic atlas of the human plasma proteome , 2018, Nature.

[19]  R. Cook Influential Observations in Linear Regression , 1979 .

[20]  S. Purcell,et al.  Pleiotropy in complex traits: challenges and strategies , 2013, Nature Reviews Genetics.

[21]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[22]  F. Windmeijer,et al.  An examination of multivariable Mendelian randomization in the single-sample and two-sample summary data settings , 2018, bioRxiv.

[23]  R. Dennis Cook,et al.  Detection of Influential Observation in Linear Regression , 2000, Technometrics.

[24]  A. Butterworth,et al.  Mendelian Randomization Analysis With Multiple Genetic Variants Using Summarized Data , 2013, Genetic epidemiology.

[25]  John P. Overington,et al.  An atlas of genetic influences on human blood metabolites , 2014, Nature Genetics.

[26]  C. M. van Duijn,et al.  Increased High Density Lipoprotein-levels associated with Age-related Macular degeneration. Evidence from the EYE-RISK and E3 Consortia , 2018 .

[27]  Jonathan Taylor,et al.  Statistical learning and selective inference , 2015, Proceedings of the National Academy of Sciences.

[28]  G. Davey Smith,et al.  Mendelian Randomization Implicates High-Density Lipoprotein Cholesterol–Associated Mechanisms in Etiology of Age-Related Macular Degeneration , 2017, Ophthalmology.

[29]  D. Lawlor,et al.  Improving the accuracy of two-sample summary-data Mendelian randomization: moving beyond the NOME assumption , 2018, bioRxiv.

[30]  William J. Astle,et al.  Allelic Landscape of Human Blood Cell Trait Variation and Links , 2016 .

[31]  Yara T. E. Lechanteur,et al.  Nature Genetics Advance Online Publication , 2022 .

[32]  B. Neale,et al.  Detection of widespread horizontal pleiotropy in causal relationships inferred from Mendelian randomization between complex traits and diseases , 2018, Nature Genetics.

[33]  Christopher N. Foley,et al.  Inferring Causal Relationships Between Risk Factors and Outcomes from Genome-Wide Association Study Data , 2018, Annual review of genomics and human genetics.

[34]  Mark W. Watson Introduction to econometrics. , 1968 .

[35]  Christine Binquet,et al.  Increased High-Density Lipoprotein Levels Associated with Age-Related Macular Degeneration: Evidence from the EYE-RISK and European Eye Epidemiology Consortia. , 2018, Ophthalmology.

[36]  Imre Lengyel,et al.  A new perspective on lipid research in age-related macular degeneration , 2018, Progress in Retinal and Eye Research.

[37]  S. Ebrahim,et al.  'Mendelian randomization': can genetic epidemiology contribute to understanding environmental determinants of disease? , 2003, International journal of epidemiology.

[38]  H. Theil Introduction to econometrics , 1978 .

[39]  Johanna M Seddon,et al.  The US twin study of age-related macular degeneration: relative roles of genetic and environmental influences. , 2005, Archives of ophthalmology.

[40]  Dennis L. Sun,et al.  Exact post-selection inference, with application to the lasso , 2013, 1311.6238.

[41]  Tanya M. Teslovich,et al.  Discovery and refinement of loci associated with lipid levels , 2013, Nature Genetics.

[42]  D. Altman,et al.  Measuring inconsistency in meta-analyses , 2003, BMJ : British Medical Journal.

[43]  J. Danesh,et al.  A comprehensive 1000 Genomes-based genome-wide association meta-analysis of coronary artery disease , 2016 .

[44]  George Davey Smith,et al.  Mendelian randomization: Using genes as instruments for making causal inferences in epidemiology , 2008, Statistics in medicine.

[45]  S. Sharp,et al.  Explaining heterogeneity in meta-analysis: a comparison of methods. , 1997, Statistics in medicine.

[46]  Fernando Pires Hartwig,et al.  Robust inference in summary data Mendelian randomization via the zero modal pleiotropy assumption , 2017, bioRxiv.