Selecting instruments for Mendelian randomization in the wake of genome-wide association studies

Mendelian randomization (MR) studies typically assess the pathogenic relevance of environmental exposures or disease biomarkers, using genetic variants that instrument these exposures. The approach is gaining popularity—our systematic review reveals a greater than 10-fold increase in MR studies published between 2004 and 2015. When the MR paradigm was first proposed, few biomarker- or exposure-related genetic variants were known, most having been identified by candidate gene studies. However, genome-wide association studies (GWAS) are now providing a rich source of potential instruments for MR analysis. Many early reviews covering the concept, applications and analytical aspects of the MR technique preceded the surge in GWAS, and thus the question of how best to select instruments for MR studies from the now extensive pool of available variants has received insufficient attention. Here we focus on the most common category of MR studies—those concerning disease biomarkers. We consider how the selection of instruments for MR analysis from GWAS requires consideration of: the assumptions underlying the MR approach; the biology of the biomarker; the genome-wide distribution, frequency and effect size of biomarker-associated variants (the genetic architecture); and the specificity of the genetic associations. Based on this, we develop guidance that may help investigators to plan and readers interpret MR studies.

[1]  E. Olson,et al.  Pervasive roles of microRNAs in cardiovascular biology , 2011, Nature.

[2]  Mario Falchi,et al.  Genome-wide Association Study Identifies Genes for Biomarkers of Cardiovascular Disease: Serum Urate and Dyslipidemia , 2022 .

[3]  L. Smeeth,et al.  Limits to causal inference based on Mendelian randomization: a comparison with randomized controlled trials. , 2006, American journal of epidemiology.

[4]  Subhajyoti De,et al.  Common variants near MC4R are associated with fat mass, weight and risk of obesity , 2008, Nature Genetics.

[5]  Andrew D. Johnson,et al.  Genome-wide association study of blood pressure and hypertension , 2009, Nature Genetics.

[6]  T. Wong,et al.  Relation of age-related cataract with obesity and obesity genes in an Asian population. , 2009, American journal of epidemiology.

[7]  Christian Gieger,et al.  New genetic loci implicated in fasting glucose homeostasis and their impact on type 2 diabetes risk , 2010, Nature Genetics.

[8]  G. Davey Smith,et al.  Mendelian randomization: genetic anchors for causal inference in epidemiological studies , 2014, Human molecular genetics.

[9]  A. Butterworth,et al.  Mendelian Randomization Analysis With Multiple Genetic Variants Using Summarized Data , 2013, Genetic epidemiology.

[10]  Hynek Pikhart,et al.  HMG-coenzyme A reductase inhibition, type 2 diabetes, and bodyweight: evidence from genetic analysis and randomised trials , 2015, The Lancet.

[11]  E. Rimm,et al.  Genome-Wide Meta-Analysis Identifies Regions on 7p21 (AHR) and 15q24 (CYP1A2) As Determinants of Habitual Caffeine Consumption , 2011, PLoS genetics.

[12]  Tanya M. Teslovich,et al.  The Metabochip, a Custom Genotyping Array for Genetic Studies of Metabolic, Cardiovascular, and Anthropometric Traits , 2012, PLoS genetics.

[13]  J. Volanakis,et al.  Human C-Reactive Protein Is Protective against Fatal Salmonella enterica Serovar Typhimurium Infection in Transgenic Mice , 2000, Infection and Immunity.

[14]  E. Lange,et al.  Genome‐Wide Association Study of Anthropometric Traits and Evidence of Interactions With Age and Study Year in Filipino Women , 2011, Obesity.

[15]  Tom S. Price,et al.  Causal effects of body mass index on cardiometabolic traits and events: a Mendelian randomization analysis. , 2014, American journal of human genetics.

[16]  S. Ring,et al.  Genetic Markers of Adult Obesity Risk Are Associated with Greater Early Infancy Weight Gain and Growth , 2010, PLoS medicine.

[17]  K. Lei,et al.  Structural analysis of the locus containing the human C-reactive protein gene and its related pseudogene. , 1987, The Journal of biological chemistry.

[18]  Christian Gieger,et al.  Loci influencing lipid levels and coronary heart disease risk in 16 European population cohorts , 2009, Nature Genetics.

[19]  Mark I. McCarthy,et al.  Concept, Design and Implementation of a Cardiovascular Gene-Centric 50 K SNP Array for Large-Scale Genomic Association Studies , 2008, PloS one.

[20]  R. Collins,et al.  Newly identified loci that influence lipid concentrations and risk of coronary artery disease , 2008, Nature Genetics.

[21]  Juan P Casas,et al.  Insight into the nature of the CRP-coronary event association using Mendelian randomization. , 2006, International journal of epidemiology.

[22]  Shahrul Mt-Isa,et al.  Genetic Loci associated with C-reactive protein levels and risk of coronary heart disease. , 2009, JAMA.

[23]  Donal N. Gorman,et al.  Using Multivariable Mendelian Randomization to Disentangle the Causal Effects of Lipid Fractions , 2014, PloS one.

[24]  Sarah Parish,et al.  Fibrinogen and coronary heart disease: test of causality by 'Mendelian randomization'. , 2006, International journal of epidemiology.

[25]  J. Pankow,et al.  Association of a Fasting Glucose Genetic Risk Score With Subclinical Atherosclerosis , 2010, Diabetes.

[26]  K. Williams,et al.  Effect of long-term exposure to lower low-density lipoprotein cholesterol beginning early in life on the risk of coronary heart disease: a Mendelian randomization analysis. , 2012, Journal of the American College of Cardiology.

[27]  C. A. Clarke,et al.  Experiments in Plant Hybridisation , 1965 .

[28]  R. Collins,et al.  Genetic variants associated with Lp(a) lipoprotein level and coronary disease. , 2009, The New England journal of medicine.

[29]  C. Gieger,et al.  Lifelong Reduction of LDL-Cholesterol Related to a Common Variant in the LDL-Receptor Gene Decreases the Risk of Coronary Artery Disease—A Mendelian Randomisation Study , 2008, PloS one.

[30]  P. Elliott,et al.  Meta-Analysis of Genome-Wide Association Studies in >80 000 Subjects Identifies Multiple Loci for C-Reactive Protein Levels , 2011, Circulation.

[31]  George Davey Smith,et al.  Mendelian randomization: Using genes as instruments for making causal inferences in epidemiology , 2008, Statistics in medicine.

[32]  N. Cox,et al.  Trait-Associated SNPs Are More Likely to Be eQTLs: Annotation to Enhance Discovery from GWAS , 2010, PLoS genetics.

[33]  Inês Barroso,et al.  Genetic Variants Influencing Circulating Lipid Levels and Risk of Coronary Artery Disease , 2010, Arteriosclerosis, thrombosis, and vascular biology.

[34]  Sarah E. Medland,et al.  Mining the Human Phenome Using Allelic Scores That Index Biological Intermediates , 2013, PLoS genetics.

[35]  A. McRae,et al.  Genome-Wide Association Study of Height and Body Mass Index in Australian Twin Families , 2010, Twin Research and Human Genetics.

[36]  R. Collins,et al.  Common variants at 30 loci contribute to polygenic dyslipidemia , 2009, Nature Genetics.

[37]  T. VanderWeele,et al.  Power and instrument strength requirements for Mendelian randomization studies using multiple genetic variants. , 2011, International journal of epidemiology.

[38]  C. Gieger,et al.  Genomewide association analysis of coronary artery disease. , 2007, The New England journal of medicine.

[39]  G. Davey Smith,et al.  Two-step epigenetic Mendelian randomization: a strategy for establishing the causal role of epigenetic processes in pathways to disease. , 2012, International journal of epidemiology.

[40]  Christopher P Cannon,et al.  Safety of anacetrapib in patients with or at high risk for coronary heart disease. , 2010, The New England journal of medicine.

[41]  Tom R. Gaunt,et al.  Apolipoprotein E genotype, cardiovascular biomarkers and risk of stroke: systematic review and meta-analysis of 14,015 stroke cases and pooled analysis of primary biomarker data from up to 60,883 individuals. , 2013, International journal of epidemiology.

[42]  R. Hegele,et al.  Human C-reactive protein (CRP) 1059G/C polymorphism , 2000, Journal of Human Genetics.

[43]  J. Olsen,et al.  Stillbirth and slow metabolizers of caffeine: comparison by genotypes. , 2006, International journal of epidemiology.

[44]  CRP CHD Genetics Collaboration,et al.  Collaborative pooled analysis of data on C-reactive protein gene variants and coronary disease: judging causality by Mendelian randomisation , 2008, European Journal of Epidemiology.

[45]  B. Pierce,et al.  Efficient Design for Mendelian Randomization Studies: Subsample and 2-Sample Instrumental Variable Estimators , 2013, American journal of epidemiology.

[46]  Jennifer G. Robinson,et al.  The interleukin-6 receptor as a target for prevention of coronary heart disease: a mendelian randomisation analysis , 2012, The Lancet.

[47]  Alex Doney,et al.  Genetic variation in GIPR influences the glucose and insulin responses to an oral glucose challenge , 2010, Nature Genetics.

[48]  Dolores Corella,et al.  Six new loci associated with blood low-density lipoprotein cholesterol, high-density lipoprotein cholesterol or triglycerides in humans , 2008, Nature Genetics.

[49]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[50]  P. Libby,et al.  Rosuvastatin to prevent vascular events in men and women with elevated C-reactive protein. , 2008, The New England journal of medicine.

[51]  Inês Barroso,et al.  Meta-analysis and imputation refines the association of 15q25 with smoking quantity , 2010, Nature Genetics.

[52]  Tanya M. Teslovich,et al.  Discovery and refinement of loci associated with lipid levels , 2013, Nature Genetics.

[53]  Tom R. Gaunt,et al.  Plasma urate concentration and risk of coronary heart disease: a Mendelian randomisation analysis , 2016, The lancet. Diabetes & endocrinology.

[54]  S. Thompson,et al.  Avoiding bias from weak instruments in Mendelian randomization studies. , 2011, International journal of epidemiology.

[55]  S. Bandinelli,et al.  Circulating β-carotene levels and type 2 diabetes—cause or effect? , 2009, Diabetologia.

[56]  Manuel Mattheisen,et al.  Genome‐wide significant association between alcohol dependence and a variant in the ADH gene cluster , 2012, Addiction biology.

[57]  G. Ginsburg,et al.  Genome-Wide Association Study of Lp-PLA2 Activity and Mass in the Framingham Heart Study , 2010, PLoS genetics.

[58]  Tanya M. Teslovich,et al.  Biological, Clinical, and Population Relevance of 95 Loci for Blood Lipids , 2010, Nature.

[59]  D. Levy,et al.  Association of Plasma Natriuretic Peptide Levels With Metabolic Risk Factors in Ambulatory Individuals , 2007, Circulation.

[60]  Meena Kumari,et al.  Separating the Mechanism-Based and Off-Target Actions of Cholesteryl Ester Transfer Protein Inhibitors With CETP Gene Polymorphisms , 2010, Circulation.

[61]  F. Collins,et al.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits , 2009, Proceedings of the National Academy of Sciences.

[62]  R. Krauss,et al.  Hepatic nuclear factor 1-α: inflammation, genetics, and atherosclerosis , 2009 .

[63]  S. Humphries,et al.  Human CRP Gene Polymorphism Influences CRP Levels: Implications for the Prediction and Pathogenesis of Coronary Heart Disease , 2003, Arteriosclerosis, thrombosis, and vascular biology.

[64]  Z. Aitken,et al.  Introduction to causal diagrams for confounder selection , 2014, Respirology.

[65]  N. Cook,et al.  Loci related to metabolic-syndrome pathways including LEPR,HNF1A, IL6R, and GCKR associate with plasma C-reactive protein: the Women's Genome Health Study. , 2008, American journal of human genetics.

[66]  Tanya M. Teslovich,et al.  Common variants associated with plasma triglycerides and risk for coronary artery disease , 2013, Nature Genetics.

[67]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[68]  Claudio J. Verzilli,et al.  Bayesian meta-analysis of genetic association studies with different sets of markers. , 2007, American journal of human genetics.

[69]  Daniel Levy,et al.  Framingham Heart Study 100K Project: genome-wide associations for blood pressure and arterial stiffness , 2007, BMC Medical Genetics.

[70]  C. Wallace,et al.  Bayesian Test for Colocalisation between Pairs of Genetic Association Studies Using Summary Statistics , 2013, PLoS genetics.

[71]  J. Whittaker,et al.  Polymorphism at the C-reactive protein locus influences gene expression and predisposes to systemic lupus erythematosus. , 2003, Human molecular genetics.

[72]  Andrew D. Johnson,et al.  SNAP: a web-based tool for identification and annotation of proxy SNPs using HapMap , 2008, Bioinform..

[73]  P. O’Reilly,et al.  Genome-wide association study identifies eight loci associated with blood pressure , 2009, Nature Genetics.

[74]  Alex P. Reiner,et al.  Mendelian randomization of blood lipids for coronary heart disease , 2014, European heart journal.

[75]  John Spertus,et al.  Plasma HDL cholesterol and risk of myocardial infarction: a mendelian randomisation study , 2012, The Lancet.

[76]  Jack Bowden,et al.  Collaborative pooled analysis of data on C-reactive protein gene variants and coronary disease: judging causality by Mendelian randomisation , 2008 .

[77]  Terho Lehtimäki,et al.  Lipoprotein subclass profiling reveals pleiotropy in the genetic variants of lipid risk factors for coronary heart disease: a note on Mendelian randomization studies. , 2013, Journal of the American College of Cardiology.

[78]  Muin J Khoury,et al.  Mendelian randomisation: a new spin or real progress? , 2003, The Lancet.

[79]  Tanya M. Teslovich,et al.  Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index , 2010 .

[80]  Per Magne Ueland,et al.  Homocysteine and risk of ischemic heart disease and stroke: a meta-analysis. , 2002, JAMA.

[81]  S. Bojesen,et al.  High tobacco consumption is causally associated with increased all-cause mortality in a general population sample of 55,568 individuals, but not with short telomeres: a Mendelian randomization study. , 2014, International journal of epidemiology.

[82]  L. Peltonen,et al.  A multilocus genetic risk score for coronary heart disease: case-control and prospective cohort analyses , 2010, The Lancet.

[83]  M. Tobin,et al.  Commentary: development of Mendelian randomization: from hypothesis test to 'Mendelian deconfounding'. , 2004, International journal of epidemiology.

[84]  D. Wald,et al.  Homocysteine and cardiovascular disease: evidence on causality from a meta-analysis , 2002, BMJ : British Medical Journal.

[85]  D. Altshuler,et al.  Validating therapeutic targets through human genetics , 2013, Nature Reviews Drug Discovery.

[86]  R. Carroll,et al.  Distribution of allele frequencies and effect sizes and their interrelationships for common genetic susceptibility variants , 2011, Proceedings of the National Academy of Sciences.

[87]  R. Krauss,et al.  Hepatic nuclear factor 1-alpha: inflammation, genetics, and atherosclerosis. , 2009, Current opinion in lipidology.

[88]  D. Rader,et al.  Effects of an inhibitor of cholesteryl ester transfer protein on HDL cholesterol. , 2004, The New England journal of medicine.

[89]  P. Imming,et al.  Drugs, their targets and the nature and number of drug targets , 2006, Nature Reviews Drug Discovery.

[90]  M. Jarvelin,et al.  A Common Variant in the FTO Gene Is Associated with Body Mass Index and Predisposes to Childhood and Adult Obesity , 2007, Science.

[91]  Philippe Froguel,et al.  Common genetic variation near MC4R is associated with waist circumference and insulin resistance , 2008, Nature Genetics.

[92]  Ellen Kampman,et al.  Genome-wide association yields new sequence variants at seven loci that associate with measures of obesity , 2009, Nature Genetics.

[93]  Peter M Visscher,et al.  Genome-wide association studies and human disease: from trickle to flood. , 2009, JAMA.

[94]  Daniel F. Schwarz,et al.  New susceptibility locus for coronary artery disease on chromosome 3q22.3 , 2009, Nature Genetics.

[95]  A. Hingorani,et al.  Nature's randomised trials , 2005, The Lancet.

[96]  M. González-Gay,et al.  Correlation between endothelial function and carotid atherosclerosis in rheumatoid arthritis patients with long-standing disease , 2011, Arthritis research & therapy.

[97]  George Davey Smith,et al.  Using multiple genetic variants as instrumental variables for modifiable risk factors , 2012, Statistical methods in medical research.

[98]  S. Thompson,et al.  Use of allele scores as instrumental variables for Mendelian randomization , 2013, International journal of epidemiology.

[99]  Tom R. Gaunt,et al.  HMG-coenzyme A reductase inhibition, type 2 diabetes, and bodyweight: evidence from genetic analysis and randomised trials , 2015, The Lancet.

[100]  P. Ridker,et al.  Polymorphism in the CETP Gene Region, HDL Cholesterol, and Risk of Future Myocardial Infarction: Genomewide Analysis Among 18 245 Initially Healthy Women From the Women’s Genome Health Study , 2009, Circulation. Cardiovascular genetics.

[101]  Yun Li,et al.  Genome-wide association study of homocysteine levels in Filipinos provides evidence for CPS1 in women and a stronger MTHFR effect in young adults. , 2010, Human molecular genetics.

[102]  M. Brown,et al.  Promise and pitfalls of the Immunochip , 2011, Arthritis research & therapy.

[103]  P. McKeigue,et al.  For Personal Use. Only Reproduce with Permission from the Lancet Publishing Group. Problems of Reporting Genetic Associations with Complex Outcomes , 2022 .

[104]  M. McCarthy,et al.  Genome-wide association studies for complex traits: consensus, uncertainty and challenges , 2008, Nature Reviews Genetics.

[105]  Michael Marmot,et al.  Cohort Profile: the Whitehall II study. , 2005, International journal of epidemiology.

[106]  S. Ebrahim,et al.  What can mendelian randomisation tell us about modifiable behavioural and environmental exposures? , 2005, BMJ : British Medical Journal.

[107]  S. Lewis,et al.  Alcohol, ALDH2, and Esophageal Cancer: A Meta-analysis Which Illustrates the Potentials and Limitations of a Mendelian Randomization Approach , 2005, Cancer Epidemiology Biomarkers & Prevention.

[108]  Christian Gieger,et al.  Six new loci associated with body mass index highlight a neuronal influence on body weight regulation , 2009, Nature Genetics.

[109]  H. Cordell,et al.  SNP Selection in Genome-Wide and Candidate Gene Studies via Penalized Logistic Regression , 2010, Genetic epidemiology.

[110]  N. Sattar,et al.  Apolipoprotein e genotype, plasma cholesterol, and cancer: a Mendelian randomization study. , 2009, American journal of epidemiology.

[111]  F. Dudbridge,et al.  Re: "Multivariable Mendelian randomization: the use of pleiotropic genetic variants to estimate causal effects". , 2015, American journal of epidemiology.

[112]  Juan P Casas,et al.  Estimation of bias in nongenetic observational studies using "mendelian triangulation". , 2006, Annals of epidemiology.

[113]  S. Thompson,et al.  Multivariable Mendelian Randomization: The Use of Pleiotropic Genetic Variants to Estimate Causal Effects , 2015, American journal of epidemiology.

[114]  F. Crick Central Dogma of Molecular Biology , 1970, Nature.

[115]  F. Zitman,et al.  Serum cholesterol, apolipoprotein E genotype and depressive symptoms in elderly European men: the FINE study. , 2009, Journal of affective disorders.

[116]  Sanjiv J. Shah,et al.  Whole-genome association study identifies STK39 as a hypertension susceptibility gene , 2009, Proceedings of the National Academy of Sciences.

[117]  S. Ebrahim,et al.  'Mendelian randomization': can genetic epidemiology contribute to understanding environmental determinants of disease? , 2003, International journal of epidemiology.

[118]  G. Davey Smith,et al.  Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression , 2015, International journal of epidemiology.

[119]  C. Hoggart,et al.  Genome-wide association analysis of metabolic traits in a birth cohort from a founder population , 2008, Nature Genetics.

[120]  R. Vasan,et al.  Bmc Medical Genetics Genome-wide Association to Body Mass Index and Waist Circumference: the Framingham Heart Study 100k Project , 2022 .

[121]  G. Ginsburg,et al.  Genome-Wide Association Study of Lp-PLA 2 Activity and Mass in the Framingham Heart Study , 2010 .

[122]  Hynek Pikhart,et al.  Association between alcohol and cardiovascular disease: Mendelian randomisation analysis based on individual participant data , 2014, BMJ : British Medical Journal.

[123]  J. Danesh,et al.  Triglyceride-mediated pathways and coronary disease: collaborative analysis of 101 studies , 2010, The Lancet.

[124]  S. Ebrahim,et al.  Mendelian randomization: prospects, potentials, and limitations. , 2004, International journal of epidemiology.

[125]  Tariq Ahmad,et al.  Meta-analysis and imputation refines the association of 15q25 with smoking quantity , 2010, Nature Genetics.

[126]  A. Reiner,et al.  Novel Genetic Approach to Investigate the Role of Plasma Secretory Phospholipase A2 (sPLA2)-V Isoenzyme in Coronary Heart Disease: Modified Mendelian Randomization Analysis Using PLA2G5 Expression Levels , 2014, Circulation. Cardiovascular genetics.

[127]  Christian Gieger,et al.  Meta-Analysis of 28,141 Individuals Identifies Common Variants within Five New Loci That Influence Uric Acid Concentrations , 2009, PLoS genetics.

[128]  S. Wild,et al.  Linkage and Genome‐wide Association Analysis of Obesity‐related Phenotypes: Association of Weight With the MGAT1 Gene , 2010, Obesity.